Multilingual Text Summarization using Deep Learning
Keywords:Summarization, TextRank, CNN, ROUGE, TAC-2011
Along with the extreme expansion of big data and the vast development of the internet, making documentation of the huge internet information is the first interest for people. These online textual data led to information overload and redundancy. Multi-document summarization is one of the solutions to such an issue, used to extract the main ideas of the documents and put them into a short summary. Summarizing documents should not affect the major concepts and the meaning of the original text. This paper proposes a new method for multi-document summarization. The basic idea of the proposed method relied on six different features to be extracted of each sentence in the studied collection, these features must be language. A set of the feature vectors is introduced to Convolutional Neural Networks (CNNs) for classification as either summary or non-summary sentences. A graph of summary sentences was generated and assigned scores by the TextRank algorithm. The implemented system was evaluated on both English and Arabic versions of the dataset of the TAC-2011 MultiLing Pilot by using ROUGE metrics. The proposed method achieved an average F-measure 0.46079, 0.20664 using ROUGE-1 and ROUGE-2 respectively, for English documents, and achieved an average F-measure 0.45624, 0.30725 for Arabic documents.
Rasim M. A., Ramiz M. A. & Nijat R. I. (2012). DESAMC+DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization, Knowledge-Based Systems , Vol. 36 , pp. 21–38.
Dutta, M. , Das, A. K. , Mallick, C. , Sarkar, A. , & Das, A. K. (2019). A graph based ap- proach on extractive Summarization. In Emerging technologies in data mining and information security . Singapore: Springer . pp. 179–187.
Ermakova, L., Cossu, J. V., & Mothe, J. (2019). A survey on evaluation of summarization methods. Information Processing & Management, Vol. 56, pp.1794-1814.
Bidoki, M., Moosavi, M. R., & Fakhrahmad, M. (2020). A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities. Information Processing & Management, Vol. 57, No.6.
Anand, D., & Wagh, R. (2019). Effective Deep Learning Approaches for Summarization of Legal Texts. Journal of King Saud University - Computer and Information Sciences. doi:10.1016/j.jksuci.2019.11.015.
Kanapala, A., Pal, S., & Pamula, R. (2019). Text summarization from legal documents: a
survey. Artificial Intelligence Review, Vol. 51, No.3, pp. 371-402.
Ansamma, J., Premjith, P.S., Wilscy, M., (2017). Extractive multi-document ummarization using population-based multicriteria optimization. Expert Syst. Appl. https://doi.org/10.1016/j.eswa.2017.05.075.
Uçkan, T., & Karcı, A. (2020). Extractive multi-document text summarization based on graph independent sets. Egyptian Informatics Journal. doi:10.1016/j.eij.2019.12.002.
Bhargava, R., & Sharma, Y. (2020). Deep Extractive Text Summarization. Procedia Computer Science, Vol.167, pp. 138–146.
Bhargava, R., Sharma, Y., Sharma, G., (2016). Atssi: Abstractive text summarization using sentiment infusion. Procedia Computer Science Vol.89, pp. 404–411.
Wafaa S. El-Kassas , Cherif R. Salama , Ahmed A. Rafea, Hoda K. Mohamed ,(2021). Automatic text summarization: A comprehensive survey.Expert Systems with Applications. Vol.165.
Luhn, H. (1958).The automatic creation of literature abstracts. IBM Journal of research and development, Vol. 2, No. 2 , pp. 159-165.
Thomas, S., Paul-Alexis,D., Sylvain, L., Benjamin, P.& Jacopo, S.(2020), MLSUM: The Multilingual Summarization Corpus, arXiv:2004.14900.
Radev, DR, Jing HY, Stys M, Tam D, (2004) Centroid-based summarization of multiple documents. Inf Process Manag ,Vol, pp.919–938.
Fung, P, Ngai G, (2006) One story, one flow: hidden Markov Story Models for multilingual multidocument summarization. ACM Trans Speech Lang Vol.3, pp.1–16. doi:10.1145/1149290.1151099.
Patel A, Siddiqui T, Tiwary US, (2007) A language independent approach to multilingual text summarization. In: Large scale semantic access to content (text, image, video, and sound), pp.123–132.
Litvak,M, Mark, L & Menahem, F.(2010), A new Approach to Improving Multilingual Summarization using a Genetic Algorithm, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 927–936.
Kabadjov M, Atkinson M, Steinberger J. (2010) NewsGist: a multilingual statistical news summarizer. Lecture notes in computer science, pp 591–594. doi:10.1007/978-3-642-15939-8_40.
El-Haj M, Kruschwitz U, Fox C. (2011) University of essex at the tac 2011 multilingual summarisation pilot. In: Proceedings of the text analysis conference (TAC) 2011, MultiLing Summarisation Pilot, Maryland, USA.
Giannakopoulos,G, Jeff ,K, John, C,Josef ,S, Benoit ,F, Mijail, K, Udo, K, & Massimo P,.( 2015). Multiling 2015: multilingual summarization of single and multi-documents, on-line fora, and call-center conversations. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 270–274.
Singh,S, Ajai, K., Abhilasha M. & Shikha S.,(2016), Bilingual Automatic Text Summarization Using Unsupervised Deep Learning, International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp.1195-1200.
Ouyang, J., Boya, S. & Kathleen ,K.( 2019). A robust abstractive system for cross-lingual summarization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 1, pp. 2025–2031.
Yann ,L., Yoshua, B. & Geoffrey, H.,(2015). Deep learning. Macmillan Publishers,Vol. 521, pp.436-444.
Yamashita, R., Nishio, M., Do, R. K. G., & Togashi, K. (2018). Convolutional neural networks: an overview and application in radiology. Insights into Imaging. doi:10.1007/s13244-018-0639-9.
Ashwin, B., Maithili, B., Pranav, G. & Rohan, C.(2016). Applications of convolutional Neural Networks,Vol 7, No. 5, pp. 2206-2215.
Wajdi, H. & Norjihan,B.(2019). Convolutional Neural Network based for Automatic Text Summarization, International Journal of Advanced Computer Science and Applications, Vol. 10, No. 4.
Lawrence, S.(1997). Face recognition: A convolutional neuralnetwork approach. IEEE transactions on neural networks , pp. 98-113.
Farabet, C., Camille, C., Laurent, N.& Yann, L.(2013) Learning hierarchical features for scene labeling.IEEE transactions on pattern analysis and machine intelligence , pp.1915-1929.
Chen, L., et al.(2014) Semantic image segmentation with deep convolutional nets and fully connected crfs."arXiv preprint arXiv:.1412.7062.
Kim,Y.,(2014). Convolutional neural networks for sentence classification, arXiv preprint arXiv:1408.5882.
Kamal, S. & Khushbu, S. (2015), Improving Graph Based Multidocument Text summarization Using an Enhanced Sentence Similarity Measure, IEEE 2nd International Conference on Recent Trends in Information Systems, pp. 395-365.
SAKIRA, K. Siti et al.(2018), Graph-based Representation for Sentence Similarity Measure : A Comparative Analysis. International Journal of Engineering & Technology, vol. 7, No. 2, pp. 32-35.
Oliveira, H., Ferreira, R., Lima, R., Lins, R. D., Freitas, F., Riss, M., & Simske, S. J. (2016). Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Systems with Applications,Vol. 65, pp.68–86.
Mao X, Yang H, Huang S, Liu Y, Li R.,(2019), Extractive summarization using supervised and unsupervised learning. Expert Syst vol.133,pp.173–81.
Mihalcea R & Tarau P.(2004), Textrank: Bringing order into text. In: Proceedings of the conference on empiricalmethods in natural language processing. pp. 404–11.
Giannakopoulos,G. El-Haj M. Favre,B. Litvak, M. Steinberger, Josef. & Varma,V. (2011). "TAC11 MultiLing Pilot Overview". TAC2011 Workshop. Presented at the TAC 2011 Gaithersburg,MD, U.S.A.
Lin, Chin-Yew (2004). ROUGE: A Package for Automatic Evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004).
How to Cite
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.