Reputation: 67

How to assume what will be the size after zlib uncompression?

I am making a simple C++ app that has to send compressed data to my API. The API fires a response at the app that is also compressed. I have to uncompress it. I am using zlib's uncompress function but I do not know how big the data is. Can some one help me with this problem? How can I calculate and set the size of the destination buffer?

Upvotes: 0

Answers (3)

Mark Adler

Reputation: 112374

You cannot know just from the compressed data itself. Since you are in control of your own API, simply send the uncompressed length ahead of the compressed data.

For those coming here for an answer for how to know the uncompressed size in order to allocate a buffer for the whole thing, and don't otherwise have access ahead of time to the size, my recommendation is simply to not do that. You can instead use the inflate interface of zlib to compress a chunk at a time, which is its purpose in life. Either you can consume, transmit, save, or otherwise process the resulting uncompressed data a chunk at a time. Or if you must save the whole thing in memory, then reallocate the buffer to grow it as needed.

Upvotes: 2

PeterT

Reputation: 8284

I think the documentation is really clear about this

ZEXTERN int ZEXPORT uncompress OF((Bytef *dest, uLongf *destLen,
                               const Bytef *source, uLong sourceLen));
Decompresses the source buffer into the destination buffer. sourceLen is the byte length of the source buffer. Upon entry, destLen is the total size of the destination buffer, which must be large enough to hold the entire uncompressed data. (The size of the uncompressed data must have been saved previously by the compressor and transmitted to the decompressor by some mechanism outside the scope of this compression library.) Upon exit, destLen is the actual size of the uncompressed data.

uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough memory, Z_BUF_ERROR if there was not enough room in the output buffer, or Z_DATA_ERROR if the input data was corrupted or incomplete. In the case where there is not enough room, uncompress() will fill the output buffer with the uncompressed data up to that point.

So zlib recommends sending over the uncompressed size along with the compressed stream.

But we can also note the sentence

In the case where there is not enough room, uncompress() will fill the output buffer with the uncompressed data up to that point.

So you can include the length in the compressed message at the beginning. Then at your destination start uncompressing with a small buffer. It might not uncompress everything into the small buffer. But it will decompress enough for you to read the data length if you wrote it at the beginning. Then you can use that to allocate/resize your destination buffer and use uncompress again.

Depending on your use-case this might be a good idea or not. If your message sizes don't vary much and the program is longer running it might just be better to just maintain one destination buffer and just grow that one as needed.

Upvotes: 2

B Abali

Reputation: 443

As a speed optimization, if you are willing to make redundant calls to uncompress occasionally you can predict the size of your dest buffer for the next call. It is often the case that data segments in a given stream compresses by the same factor, approximately. For example, text will compress by 2 to 3x typically. Therefore, record the last size of your dest buffer somewhere. Then, allocate the same amount for the next uncompress call. If too little (Z_BUF_ERROR), then increase the buffer size and repeat. If too much buffer space, there is no problem; just reduce the size for the next call.

Here is an additional optimization. Suppose your dest is going to be very large, say gigabytes large. And you do not want to waste cpu cycles doing trial decompressions. You can feed only the first few hundred KB of your source data and see how much it expands. Then allocate the actual dest buffer accordingly. I don't know if uncompress() will let you do that though but inflate() will.

Upvotes: 0

How to assume what will be the size after zlib uncompression?

Answers (3)

Related Questions