Techniques of Data Deduplication for Cloud Storage: A Review


  • Ayad Hasan Adhab Kufa University, Computer Science and Mathematics College, Iraq
  • Naseer Ali Hussien Wasit University, College of Education for Pure Science, Iraq



Cloud Computing, Cloud Storage Service, Chunking Algorithm, Data Deduplication, Chunking Method


With the rapid advancement of information technology and network, it's becoming increasingly difficult to keep up, as well as the rapid expansion of data center size, energy consumption as a percentage of IT investment is increasing. As the amount of digital data grows, so does the need for greater storage space, which drives up the cost and performance of backups. Traditional backup solutions don't have any built-in protection against duplicate data being saved up. Duplicate data backups severely lengthen backup times and consume needless resources. Data deduplication is critical for removing redundant data and lowering storage costs. Data deduplication is a new technique of compressing data that helps with storage efficiency while also proving to be a more efficient technique of dealing with duplicate data. Deduplication enables a single data copy to be uploaded to storage and subsequent copies to be provided with a pointer to the original stored copy. This paper consists of an extensive literature survey and summarizes numerous storage approaches, concepts, and categories that are used in data reduplication. Also in this paper, the researchers carried out the survey for chunk-based data deduplication techniques in detail.


R. Kaur, I. Chana, and J. Bhattacharya, "Data deduplication techniques for efficient cloud storage management: a systematic review," The Journal of Supercomputing, vol. 74, no. 5, 2018, pp. 2035-2085,

M. Gu, X. Li, and Y. Cao, "Optical storage arrays: a perspective for future big data storage," Light: Science & Applications, vol. 3, no. 5, 2014, pp. e177-e177, doi:10.1038/lsa.

H. S. Chyad, R. A. Mustafa, and D. N. George, "Cloud resources modelling using smart cloud management," Bulletin of Electrical Engineering and Informatics, vol. 11, no. 2, April 2022, pp. 1134~1142, ISSN: 2302-9285, doi: 10.11591/eei.v11i2.3286.

H. S. Chyad, R. A. Mustafa, and K.T. Saleh, “Subject Review: Cloud Computing using RSA Algorithm”, International Journal of Engineering Research and Advanced Technology (IJERAT), Volume.7, No. 7, July -2021, ISSN: 2454-6135, doi: 10.31695/IJERAT.2021.3731.

J. Wu, L. Ping, X. Ge, Y. Wang, and J. Fu, "Cloud storage as the infrastructure of cloud computing," in 2010 International conference on intelligent computing and cognitive informatics, 2010, pp. 380-383, DOI 10.1109/ICICCI.2010.119, IEEE.

H. S. Chyad, R. A. Mustafa, and K. T. Saleh, "Study and Implementation of Resource Allocation Algorithms in Cloud Computing," International Journal of Engineering & Technology, vol. 7, no. 4.28, pp. 591-594, 2018, doi: 10.14419/ijet.v7i4.28.25394.

A. A. Maryoosh, R. S. Mohammed, and R. A. Mustafa, "Subject Review: Cloud Computing Security Based on Cryptography," International Journal of Engineering Research and Advanced Technology-IJERAT (ISSN: 2454-6135), vol. 5, no. 9, pp. 20-23, 2019, doi: 10.31695/IJERAT.2019.3569.

P. A. Kumar, E. Pugazhendhi, and K. V. Lakshmi, "Cloud Data Storage Optimization by Using Novel De-Duplication Technique," in 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), 2022, pp. 436-442, doi: 10.1109/ICSSIT53264.2022.9716508, IEEE.

V. Javaraiah, "Backup for cloud and disaster recovery for consumers and SMBs," in 2011 Fifth IEEE International Conference on Advanced Telecommunication Systems and Networks (ANTS), 2011, pp. 1-3: IEEE.

Y. Tan, H. Jiang, D. Feng, L. Tian, Z. Yan, and G. Zhou, "SAM: A semantic-aware multi-tiered source de-duplication framework for cloud backup," in 2010 39th International Conference on Parallel Processing, 2010, pp. 614-623, doi: 10.1109/ICPP.2010.69, IEEE.

S. K. Bose, S. Brock, R. Skeoch, N. Shaikh, and S. Rao, "Optimizing live migration of virtual machines across wide area networks using integrated replication and scheduling," in 2011 IEEE International Systems Conference, 2011, pp. 97-102: IEEE.

S. K. Bose, S. Brock, R. Skeoch, and S. Rao, "CloudSpider: Combining replication with scheduling for optimizing live migration of virtual machines across wide area networks," in 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2011, pp. 13-22, doi: 10.1109/CCGrid.2011.16, IEEE.

W. Jannen, "Deduplication: Concepts and Techniques," 2020.

J. Paulo and J. Pereira, "A survey and classification of storage deduplication systems," ACM Computing Surveys (CSUR), vol. 47, no. 1, 2014, pp. 1-30, doi:

B. Mao, H. Jiang, S. Wu, Y. Fu, and L. Tian, "Read-performance optimization for deduplication-based storage systems in the cloud," ACM Transactions on Storage (TOS), vol. 10, no. 2, pp. 1-22, doi:, 2014.

R. Di Pietro and A. Sorniotti, "Proof of ownership for deduplication systems: a secure, scalable, and efficient solution," Computer Communications, vol. 82, 2016, pp. 71-82,

J. Wang and X. Chen, "Efficient and secure storage for outsourced data: A survey," Data Science and Engineering, vol. 1, no. 3, 2016, pp. 178-188, doi: 10.1007/s41019-016-0018-9.

T. T. Thwel and G. Sinha, Data Deduplication Approaches: Concepts, Strategies, and Challenges. Academic Press, 2020.

A. Gracia, S. González, V. Robles, and E. Menasalvas, "A methodology to compare dimensionality reduction algorithms in terms of loss of quality," Information Sciences, vol. 270, pp. 1-27, 2014.

L. Xu, "Online Deduplication for Distributed Databases," Carnegie Mellon University, 2016.

N. Kumar and S. Jain, "Efficient data deduplication for big data storage systems," in Progress in Advanced Computing and Intelligent Engineering: Springer , 2019, pp. 351-371,

W. Leesakul, P. Townend, and J. Xu, "Dynamic data deduplication in cloud storage," in 2014 IEEE 8th International Symposium on Service Oriented System Engineering, 2014, pp. 320-325 , doi: 10.1109/SOSE.2014.46, IEEE.

Y. Fan, X. Lin, W. Liang, G. Tan, and P. Nanda, "A secure privacy preserving deduplication scheme for cloud computing," Future Generation Computer Systems, vol. 101, pp. 127-135, 2019.

M. Oh et al., "Design of global data deduplication for a scale-out distributed storage system," in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, pp. 1063-1073, doi:10.1109/ICDCS.2018.00106IEEE.

P. Anitha, R. Dhanushram, D. H. Sudhan, and T. Indhresh, "Security Aware High Scalable paradigm for Data Deduplication in Big Data cloud computing Environments," in Journal of Physics: Conference Series, 2021, vol. 1916, no. 1, p. 012097: IOP Publishing, doi:10.1088/1742-6596/1916/1/012097.

R. Kirubakaran, C. M. Prathibhan, and C. Karthika, "A cloud based model for deduplication of large data," in 2015 IEEE International Conference on Engineering and Technology (ICETECH), 2015, pp. 1-4: IEEE.

M. V. Maruti and M. K. Nighot, "Authorized data Deduplication using hybrid cloud technique," in 2015 International Conference on Energy Systems and Applications, 2015, pp. 695-699: IEEE.

K. Vijayalakshmi and V. Jayalakshmi, "Analysis on data deduplication techniques of storage of big data in cloud," in 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), 2021, pp. 976-983: IEEE, doi: 10.1109/ICCMC51019.2021.9418445.

X. Xu, N. Hu, and Q. Tu, "Two-side data deduplication mechanism for non-center cloud storage systems," in 2016 IEEE International Conference on Ubiquitous Wireless Broadband (ICUWB), 2016, pp. 1-4: IEEE.

X. Xu and Q. Tu, "Data deduplication mechanism for cloud storage systems," in 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2015, pp. 286-294: IEEE, DOI 10.1109/CyberC.2015.71.

M. Ellappan and S. Abirami, "Dynamic Prime Chunking Algorithm for Data Deduplication in Cloud Storage," KSII Transactions on Internet and Information Systems (TIIS), vol. 15, no. 4, pp. 1342-1359, 2021,

H. A. Jasim and A. A. Fahad, "New techniques to enhance data deduplication using content based-TTTD chunking algorithm," International Journal of Advanced Computer Science and Applications, vol. 9, no. 5, p. 116, 2018.

W. Xia et al., "{FastCDC}: A Fast and Efficient {Content-Defined} Chunking Approach for Data Deduplication," in 2016 USENIX Annual Technical Conference (USENIX ATC 16), 2016, pp. 101-114.

W. Xia et al., "The design of fast content-defined chunking for data deduplication based storage systems," IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 9, pp. 2017-2031, 2020, doi: no. 10.1109/TPDS.2020.2984632.

Y. Zhang et al., "AE: An asymmetric extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplication," in 2015 IEEE Conference on Computer Communications (INFOCOM), 2015, pp. 1337-1345: IEEE.

Z. Xu and W. Zhang, "QuickCDC: A Quick Content Defined Chunking Algorithm Based on Jumping and Dynamically Adjusting Mask Bits," in 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), 2021, pp. 288-299: IEEE, DOI 10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00049.

P. Minishapriya and S. Maheswari, "Performance Analysis of Cloud Storage Using Chunking Algorithm," in 2018 Fourth International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), 2018, pp. 1-5: IEEE.

S. Luo and M. Hou, "A novel chunk coalescing algorithm for data deduplication in cloud storage," in 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), 2013, pp. 1-5: IEEE.

H. Wu, C. Wang, K. Lu, Y. Fu, and L. Zhu, "One size does not fit all: The case for chunking configuration in backup deduplication," in 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2018, pp. 213-222: IEEE, DOI 10.1109/CCGRID.2018.00036.

P. Krishnaprasad and B. A. Narayamparambil, "A proposal for improving data deduplication with dual side fixed size chunking algorithm," in 2013 Third International Conference on Advances in Computing and Communications, 2013, pp. 13-16: IEEE, DOI 10.1109/ICACC.2013.10.


How to Cite

Ayad Hasan Adhab, & Naseer Ali Hussien. (2022). Techniques of Data Deduplication for Cloud Storage: A Review. International Journal of Engineering Research and Advanced Technology (ijerat) (E-ISSN 2454-6135) DOI: 10.31695/IJERAT, 8(4), 7–18.



Section 1: Computer Science & information Engineering