c00kiemonster
c00kiemonster

Reputation: 23351

What is this unexpected behavior in Python's zlib?

I have an example in zlib that gives an unexpected result.

If I start with compressing a simple string:

>>> import zlib
>>> import binascii
>>> compressed = binascii.hexlify(zlib.compress('first_message'))
>>> compressed
'789c4bcb2c2a2e89cf4d2d2e4e4c4f05002651056d'

Now if I do the reverse I get exactly what I expected:

>>> zlib.decompress(compressed.decode("hex"))
'first_message'

However if I try this I get an unexpected result:

>>> d = zlib.decompressobj()
>>> d.decompress(compressed.decode("hex"))
'first_message'
>>> d = zlib.decompressobj()
>>> d.decompress(compressed[:-2].decode("hex"))
'first_message'

What am I missing here? Why do these two return the same result?

Upvotes: 1

Views: 424

Answers (1)

Mark Adler
Mark Adler

Reputation: 112284

Because that is how much decompressed data is available given the input. A zlib.decompressobj() allows you to feed the object chunks of the compressed data, and will return as much decompressed data as possible. You can then feed it more to get more.

You will get all the decompressed data if you only cut off the last four or five bytes, since you are knocking off the final Adler-32 check (four bytes) and possibly only the end code of the last deflate block and some unused bits to bring it to a byte boundary (one more byte). Those are not needed to decompress all of the data, but rather are needed only to mark the end of the data and to provide a check on the integrity of the data.

Upvotes: 2

Related Questions