Using pre-calculated data when using gzip

Question

I would like to pre-calculate some structure (hash / dictionary / tree - depends on the terminology) and use it with gzip when I compress / decompress data.

The motivation is saving data over the wire in the following scenario:

I have many relatively small (several KB) textual responses a server sends to clients. Those responses have a very similar structure, but are not identical. I can put static structures in both the client and the server (they don't have to be the same).

The goal is to save CPU time computing something that me be computer multiple times for different server responses, but more importantly - save bytes over the wire when I can use static structures.

Another option is using a different compression algorithm other than gzip, but I'd rather not to.

Thanks!

Mark Adler · Accepted Answer

To "save bytes over the wire", you should use zlib's deflateSetDictionary() and inflateSetDictionary() operations. They allow you to provide up to 32K of data similar to what is being compressed, referred to as a "dictionary". The exact same dictionary needs to be available on the decompression end. There is nothing special about the dictionary. It can be constructed simply as a concatenation of as many of your several KB responses as will fit in 32K.

This will use the zlib format instead of the gzip format, as the gzip format has no provisions for the use of a preset dictionary.

There will be no gain in CPU time, and in fact it will cost a little more CPU time to process the 32K dictionary. But in the case you describe, it can dramatically improve the compression. You can reduce the CPU time by processing the 32K dictionary once and then copying the deflate state with deflateCopy() for reuse.

Using pre-calculated data when using gzip

Answers (1)

Related Questions