Gojus
Gojus

Reputation: 21

Time complexity of optimal DEFLATE compression

The DEFLATE algorithm is specified in RFC 1951. However, the encoder can choose freely whether to insert a literal byte, or a sub-match from output buffer for each input byte. Assuming that everything is encoded in a single block with dynamic Huffman compression, what is the time complexity to achieve the optimal compression ratio as allowed by the spec?

I fear it may not be polynomial, especially because Zopfli exists (and they don't claim to produce optimal result). However, I was unable to find any actual analysis or proof that confirms this.

Upvotes: 0

Views: 73

Answers (1)

Mark Adler
Mark Adler

Reputation: 112502

Your premise is flawed. A single dynamic block will rarely be optimal. Better compression can be achieved with multiple dynamic blocks that adapt to the changing literal, length, and distance distributions in the data. If you make the blocks too long, you lose out on compression efficiency that easily pays for the occasional dynamic block header to describe new codes.

Your question is then also flawed, if by "time complexity", you mean O(some function of the uncompressed data length). All deflate compressors that I have seen will emit multiple deflate blocks of some constrained range of sizes. In that case, the time complexity is O(n), where n is the input length.

Perhaps what you are really after is information about the constant in front of the n. That constant can be large or small, depending on how much effort you want to put into finding matches, selecting matches, and arranging them into deflate blocks. That constant is not the time complexity.

Upvotes: 0

Related Questions