Reputation: 187
I have an encoded string that I managed to decode without knowing how it was encoded originally. This is how I managed to decode:
original_str = "LONG_SNIP" # Is clearly a base64 string
decoded_str = base64.b64decode(original_str) # Becomes unreadable mess
decompressed_str = zlib.decompress(decoded_str, -15) # Plain text, success
I would like to point out that the zlib argument '-15' is mandatory (anything works between -8 and -15)
However, if I want to encode a plain text string into this exact same format, so that the above code would successfully decode that one too, I run into problems.
I checked the zlib documentation and tried zlib.compress, as well as creating a compressobj and attempting to compress with it but had no success.
It seems like this '-15' value can't be input into any function to reverse the decompression that I did originally.
This is what I also tried, but I'm getting blank output:
compress = zlib.compressobj( 1, zlib.DEFLATED, -15, zlib.DEF_MEM_LEVEL, 0 )
deflated = compress.compress(string_to_compress)
encoded = base64.b64encode(deflated)
print(encoded)
QUESTIONS:
What does the integer parameter mean and why do all values between -8 and -15 give the same exact output?
And more importantly, how could I reverse my decompression?
Answers are very much appreciated, thanks!
Upvotes: 0
Views: 2246
Reputation: 1121486
The second parameter to zlib.decompress()
is the wbits argument. From the documentation:
The wbits parameter controls the size of the history buffer (or “window size”), and what header and trailer format is expected. It is similar to the parameter for
compressobj()
, but accepts more ranges of values:
- [...]
- −8 to −15: Uses the absolute value of wbits as the window size logarithm. The input must be a raw stream with no header or trailer.
- [...]
When decompressing a stream, the window size must not be smaller than the size originally used to compress the stream; using a too-small value may result in an
error
exception.
Negative values simply mean that there is no header or trailer in the data stream.
So if any value between -8 and -15 works, the window size on the compression was quite small to begin with. Bigger window sizes require more memory for the larger history buffer, but make decompression go faster. The only requirement is that it should be equal to or larger than the one used to compress the data, because otherwise references to previous data blocks used in the compression stream can't be found any more (I think, I'm sure Mark Adler will correct me on this if I'm wrong).
The zlib manual seems to suggest that wbits=8
would actually automatically be replaced with wbits=9
, and presumably the same would happen with -8
.
This translates to zlib.compresobj()
wbits values between -9 and -15; from the documentation again:
- −9 to −15: Uses the absolute value of wbits as the window size logarithm, while producing a raw output stream with no header or trailing checksum.
Compressing with the smallest window size should suffice:
compressor = zlib.compressobj(-1, zlib.DEFLATED, -9)
compressed = compressor.compress(data_to_compress) + compressor.flush()
Demo:
>>> import zlib
>>> compressor = zlib.compressobj(-1, zlib.DEFLATED, -9)
>>> compressor.compress('foo bar baz') + compressor.flush()
'K\xcb\xcfWHJ,\x02\xe2*\x00'
>>> zlib.decompress(_, -8)
'foo bar baz'
Upvotes: 1