Incompressible zlib/deflate data

We notice that string "\x00\x01\x02...\xff", despite it's highly regular, is not compressible with LZ77 and/or Huffman coding.  To extend the idea a bit further, we could generate arbitrary long incompressible data:
  std::string incompressible;

  for (int step = 1; step <= 128; ++step)
  {
    for (int inc = 0; inc < step; ++inc)
    {
      for (int i = inc; i < 256; i += step)
      {
        incompressible.push_back(i);
      }
    }
  }
This generates a 32,768-byte string incompressible, with byte values:
0, 1, 2, 3, 4, 5, ..., 255,  // step = 1, inc = 0
0, 2, 4, 6, 8, 10, ..., 254,  // step = 2, inc = 0
1, 3, 5, 7, 9, 11, ..., 255,  // step = 2, inc = 1
0, 3, 6, 9, 12, 15, ...,  // step = 3, inc = 0
1, 4, 7, 10, 13, 16, ...,  // step = 3, inc = 1
2, 5, 8, 11, 14, 17, ...,  // step = 3, inc = 2
...
...

incompressible is not compressible with zlib/deflate, because:
  • frequency of every symbols (characters) are the same, so Huffman coding is helpless.
  • minimum match length of LZ77 chosen by deflate is 3, and incompressible contains no 3-tuple that duplicates. (Technically, there will be only literals in encoded deflate stream, no length-distance pairs.)
Since the max window size of deflate algorithm is 32,786 bytes, multiplication (concatenation) of incompressible is still incompressible.

0 comments: