Data Compression Explained (Simple)

different files

In this article, the basic concept of data compression is explained. Questions like why data compression is used and how it works are answered in a simple as possible manner.

Definition

Data compression means turning a piece of data into a smaller piece by using different coding. The coding is often based on identifying repetitive patterns in the data.

Take for example the following set of characters:

XXXXXXXYYYYYYYZZZZZZZ

We can display every character just once with the number of times it is present, like below:

X7Y7Z7

We have “compressed” the set of characters into a much shorter set. The set consisted of 21 characters, now there are only 6. We have saved 71,43% space. This percentage is called the compression ratio and is calculated as follows:

1 - (size of output / size of input)

Why is Data Compression Used?

Using data compression holds huge benefits. By removing or reencoding the repetitive parts of the data less space is needed to store it. Also when data is sent out through a network it takes less time to transfer it.

So by applying data compression you save:

  • Memory
  • Time

How Does Data Compression Work?

First, a compression algorithm is applied to change the data file into a smaller package. This smaller data package is then saved or transferred through a network. When somebody wants to access the data file, a decompression algorithm is applied which is the opposite of the compression algorithm. The decompression algorithm restores the compressed file to its original form.

So ultimately you have the same file while saving a lot of memory and time. The downside is that the file has to be processed every time someone wants to access it. However, processing is faster and less costly than consuming a lot of memory.

Compression Algorithms

Depending on the nature of the data different kinds of algorithms are used. These algorithms can be divided into two broad categories: lossless and lossy compression.

Lossless compression compresses and decompresses the file without losing any parts of the original file. The focus is on identifying repetitive patterns in the file.

In the process of lossy compression, parts of the original file are lost. The result is often a file of a lesser quality than the original. The focus is on distinguishing important and less important parts.

Is Data Compression Always a Good Idea?

Data compression has huge benefits and is essential for a lot of technologies, however data compression is not always applicable.

Compression algorithms are designed to find certain patterns in the data. When the content of the data is very random, the gained space or compression ratio can be very limited.

This is also the case when the dataset is very small.

The compression can reveal certain patterns in the data that can pose a security risk.

Reasons not to compress:

  • Data too random
  • Dataset too small
  • Risk of revealing patterns