Reputation: 1511
I'm going through the python.org's python tutorial, at the moment. I'm on 10.9 and I am trying to use the zlib library to compress a string. However, the len(compressedString)
isn't always less than the len(originalString)
. My interpreter code is below:
>>> import zlib
>>> s = 'the quick brown fox jumps over the lazy dog'
>>> len(s)
43
>>> t = zlib.compress(s)
>>> len(t)
50
>>> t
'x\x9c+\xc9HU(,\xcdL\xceVH*\xca/\xcfSH\xcb\xafP\xc8*\xcd-(V\xc8/K-R(\x01J\xe7$VU*\xa4\xe4\xa7\x03\x00a<\x0f\xfa'
>>> len(zlib.decompress(t))
43
>>> s2 = "something else i'm compressing"
>>> len(s2)
30
>>> t2 = zlib.compress(s2)
>>> len(t2)
37
>>> s3 = "witch which has which witches wrist watch"
>>> len(s3)
41
>>> t3 = zlib.compress(s3)
>>> len(t3)
37
Does anyone know why this is happening?
Upvotes: 1
Views: 7199
Reputation: 1125208
The zlib compression algorithm is not always efficient:
>>> len(zlib.compress('ab'))
10
because it needs to add metadata (headers, symbol tables, backreferences) that could amount to more data than what you tried to compress. Use it on longer, not-so-random data and it'll compress things just fine:
>>> lorem = 'Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit'
>>> len(lorem) * 100
9100
>>> len(zlib.compress(lorem * 100))
123
Upvotes: 11
Reputation: 112617
However, the len(compressedString) isn't always less than the len(originalString).
That would, of course, be impossible. At least if you expected to always be able to losslessly retrieve the original string.
The deflate algorithm will however never expand by more than a small percentage, plus six bytes for the zlib header and trailer. The zlib header identifies it as a zlib stream, and the trailer provides an integrity check on the data.
Upvotes: 2