will
will

Reputation: 1511

Python zlib not compressing string?

I'm going through the python.org's python tutorial, at the moment. I'm on 10.9 and I am trying to use the zlib library to compress a string. However, the len(compressedString) isn't always less than the len(originalString). My interpreter code is below:

>>> import zlib
>>> s = 'the quick brown fox jumps over the lazy dog'
>>> len(s)
43
>>> t = zlib.compress(s)
>>> len(t)
50
>>> t
'x\x9c+\xc9HU(,\xcdL\xceVH*\xca/\xcfSH\xcb\xafP\xc8*\xcd-(V\xc8/K-R(\x01J\xe7$VU*\xa4\xe4\xa7\x03\x00a<\x0f\xfa'
>>> len(zlib.decompress(t))
43
>>> s2 = "something else i'm compressing"
>>> len(s2)
30
>>> t2 = zlib.compress(s2)
>>> len(t2)
37
>>> s3 = "witch which has which witches wrist watch"
>>> len(s3)
41
>>> t3 = zlib.compress(s3)
>>> len(t3)
37

Does anyone know why this is happening?

Upvotes: 1

Views: 7199

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1125208

The zlib compression algorithm is not always efficient:

>>> len(zlib.compress('ab'))
10

because it needs to add metadata (headers, symbol tables, backreferences) that could amount to more data than what you tried to compress. Use it on longer, not-so-random data and it'll compress things just fine:

>>> lorem = 'Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit'
>>> len(lorem) * 100
9100
>>> len(zlib.compress(lorem * 100))
123

Upvotes: 11

Mark Adler
Mark Adler

Reputation: 112617

However, the len(compressedString) isn't always less than the len(originalString).

That would, of course, be impossible. At least if you expected to always be able to losslessly retrieve the original string.

The deflate algorithm will however never expand by more than a small percentage, plus six bytes for the zlib header and trailer. The zlib header identifies it as a zlib stream, and the trailer provides an integrity check on the data.

Upvotes: 2

Related Questions