An Approach for Multi-Document Text Summarization Using Extreme Learning Machine and LexRank

Authors

  • Wedad Abdul Khuder Naser Mustansiriyah University Iraq.

DOI:

https://doi.org/10.31695/IJERAT.2021.3704

Keywords:

Summarization, Duc-2002, Lexrank, Extreme Learning Machine

Abstract

Due to the exponential growth of online textual data and the variety of its sources, there is a need to produce an accurate text summary with the least time and effort. Extractive multi-document text summarization methods are intended to automatically generate summaries from a document collection, covering the main content and avoiding redundant information. In this study, a new method for extractive multi-document summarization has been proposed based on the combination of supervised and unsupervised learning. Throughout the supervised learning, a set of seven features was extracted from each sentence in the document collection and introduces to the Extreme  Learning Machine (ELM), to distinguish between important and unimportant sentences. A graph of important sentences was generated and assigned scores by the LexRank algorithm during the unsupervised learning. The performance of the proposed method on the DUC-2002 dataset was calculated using ROUGE evaluation metrics. The proposed method achieved a 0.47472 ROUGE  for 200-word summaries and 0.54641 ROUGE for 400-word summaries.

References

Ermakova, L., Cossu, J. V., & Mothe, J. (2019), A survey on evaluation of summarization methods. Information Processing & Management, Vol 56,pp. 1794-1814.

Yousefi-Azar,M. & Hamey,L. (2017), Text summarization using unsupervised deep learning, Expert Systems with Applications, vol. 68, pp. 93-105.

Alguliev, R,M. , Aliguliyev,M. & Isazade, N.,R., (2013), Multiple document summarization based on the evolutionary optimization algorithm, Expert Systems with Applications, vol. 40, no. 5, pp. 1675-1689.

J Sanchez-Gomez,M.,(2018) , Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach, Knowledge-Based Systems, vol. 159, pp. 1-8.

Zuhair H., A., Ahmed K., H., Haithem, K., A., Elham F.l,(2021), Extractive multi-document summarization using harmony search algorithm, TELKOMNIKA, Vol. 19, No. 1,, pp. 89~95.

Oliveira, H., Ferreira, R., Lima, R., Lins, R. D., Freitas, F., Riss, M., & Simske, S. J. (2016). Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Systems with Applications, vol. 65, 68–86.

Alguliyev, R., (2017)., A model for text summarization,” International Journal of Intelligent Information Technologies, vol. 13, no. 1, pp. 67-85.

Ferreira, R. et al.,(2013) Assessing sentence scoring techniques for extractive text summarization, Expert systems with applications, vol.40, pp. 5755-5764.

Uçkan, T., & Karcı, A. (2020). Extractive multi-document text summarization based on graph independent sets. Egyptian Informatics Journal. doi:10.1016/j.eij.2019.12.002

Premjith, P.S.; John, A.;Wilscy, M.,(2015), Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization. In Mining Intelligence and Knowledge Exploration; Prasath, R., Vuppala, A.K., Kathirvalavakumar, T., Eds.; Springer International Publishing: Cham, Switzerland, pp. 347–358.

MirShojaee, H.; Masoumi, B.; Zeinali, E.A.,(2017), Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization. Int. J. Ind. Eng. Prod. Res. , 28.

Yasunaga, M., Zhang, R., Meelu, K. , Pareek, A., , Srinivasan, K., & Radev, D., (2017). Graph-based Neural Multi-Document Summarization.

Cuéllar C., Mendoza M., Cobos C. (2018) Automatic Generation of Multi-document Summaries Based on the Global-Best Harmony Search Metaheuristic and the LexRank Graph-Based Algorithm. In: Castro F., Miranda- Jiménez S., González-Mendoza M. (eds) Advances in Computational Intelligence. MICAI . Lecture Notes in Computer Science, vol 10633. Springer, Cham. https://doi.org/10.1007/978-3-030-02840-4_7.

Jinming Z., Ming L., Longxiang G., Yuan J., Lan D., He Z., He Z. & Gholamreza H.,(2020), SUMMPIP: UNSUPERVISED MULTI-DOCUMENT SUMMARIZATION WITH SENTENCE GRAPH COMPRESSION, Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval pp. 1949–1952

Huang,G.,B., Zhu,Q.,Y., & Siew,C.,K.,(2006) ,Extreme learning machine: theory and applications, Neurocomputing, vol. 70, no. 1, pp. 489–501,.

Yang, L., Li, Y., & Li, Z. (2017). Improved-ELM method for detecting false data attack in smart grid. International Journal of Electrical Power and Energy Systems, vol. 91,pp. 183–191.

Zhao,X.,G., Wang, X. , Gong,P and Y. Zhao,(2011), XML document classification based on ELM, Neurocomputing, Vol.74, no. 16, pp.2444-2451.

Wang,G., Zhao,Y. & D. Wang, (2008)A protein secondary structure prediction framework based on the Extreme Learning Machine, Neurocomputing, vol.72, no.3, pp.262-268.

Choi,K., . Toh,K., & Byun,H.,(2012), Incremental face recognition for large-scale social network services, Pattern Recognition, Vol. 45, no.8, pp.2868-2883.

Erkan ,G., and Radev,D.R. ,(2004), LexRank: Graph-based Lexical Centrality as Salience in Text Summarization,Journal of Artificial Intelligence Research vol.22, pp. 457-479.

Mao X, Yang H, Huang S, Liu Y, Li R.,(2019), Extractive summarization using supervised and unsupervised learning. Expert Syst vol.133,pp.173–81.

Lin,Y. , Jiang.J & Lee,S.(2014)." A Similarity Measure for Text Classification and Clustering". IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, Vol. 26, No. 7,PP.1575-1590

NIST. Document understanding conferences. Available: https:// www-nlpir.nist.gov/projects/duc.

Lin, Chin-Yew (2004). ROUGE: A Package for Automatic Evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004) .

Gholamrezazadeh, S., Salehi, M. A., & Gholamzadeh, B. (2009). A comprehensive survey on text summarization systems. Computer Science and its Applications, 1–6.

Landauer TK, Foltz PW, Laham D.,(1998), An introduction to latent semantic analysis.Discourse Process , Vol.25,(2–3),pp.259–84.

Mihalcea R.,(2005), Language independent extractive summarization. In: Proc. ACL interact. Poster demonstrates. ACL no.05, p. 49–52.

Vanderwende L, Suzuki H, Brockett C, Nenkova A. Beyond,(2007) SumBasic: task-focused summarization with sentence simplification and lexical expansion. Inf Process Manage , Vol.43, no.6,pp. 1606–18.

Haghighi A, Vanderwende L.(2009, Exploring content models for multi-document summarization,vol. 362.

Published

2021-05-25

How to Cite

Wedad Abdul Khuder Naser. (2021). An Approach for Multi-Document Text Summarization Using Extreme Learning Machine and LexRank. International Journal of Engineering Research and Advanced Technology (ijerat), 7(5), 19-28. https://doi.org/10.31695/IJERAT.2021.3704

Issue

Section

Articles