What is data deduplication and how it optimizes information storage

Deduplication is the technical process of detecting and removing repetitive data and copies of files in order to optimize disk space. In some cases, deduplication is used to reduce the amount of information transferred in network communication.

There are two approaches to deduplication in a broad sense:

At the block level. A file system is divided into blocks in a specific way. A block is one logical unit of information that has physical limits, such as 4 KB. If repeated blocks are detected, only one copy is left and becomes the “original”; duplicate blocks are replaced by references to the original.
At the file level. At this level, deduplication involves comparing data at the level of entire files. If a new file repeats an existing file, only a reference to the original is kept instead. If the new file is unique, it is retained in its entirety.

Both approaches have different performance variations. The need to implement a specific deduplication method in each case is determined by technical feasibility.

🠔 Back to Glossary

Data deduplication

Leave a Reply