Reputation: 9
I've run into a bit of an issue with gzip compression.
Say we compress the letter 'a' with pako library in javascript, as follows:
pako.gzip('a')
This returns the following byte array:
[31,139,8,0,0,0,0,0,0,3,75,4,0,67,190,183,232,1,0,0,0]
Now, if I take this array and decompress it in python, I get the result as expected:
import gzip
arr_from_pako = bytearray([31,139,8,0,0,0,0,0,0,3,75,4,0,67,190,183,232,1,0,0,0])
decompressed = gzip.decompress(arr_from_pako)
print(decompressed)
>>> b'a'
However, if I then reverse the process and gzip compress the result in python, I get a different bytearray:
arr_from_python = list(gzip.compress(decompressed))
print(arr_from_python)
>>> [31, 139, 8, 0, 123, 15, 138, 98, 2, 255, 75, 4, 0, 67, 190, 183, 232, 1, 0, 0, 0]
What I need to do here is reproduce the result of pako.gzip in python. I'm guessing the differences are due to a compression level and I know gzip lib in python lets you adjust it but I tried every single setting and was not able to get the expected result.
Can anyone help?
Upvotes: 0
Views: 1180
Reputation: 112384
You can't, and you don't need to. There is no issue.
The difference in the examples you give is only in the gzip header, where Python is putting in a modification time and setting some other bytes in the header to different values than pako. You can use mtime=0
in Python to get rid of the modification time, but you will still have those other differences. (0, 3
vs. 2, 255
— those are the XFL and OS flags in the header.)
You could manually rewrite the header in Python to be the same as gzip. However that still won't meet your objective in the long run, once you are compressing reasonable amounts of data. Then, even if you force the headers to be the same, the compressed data can and likely will be different. The only way to assure that the compressed data is the same is for you to be personally in control of the libraries used in both Python and pako, make sure that they are the same source code, the same version of that source code, and are using the exact same controlling parameters. I doubt that you are, and so you are on a fool's errand to try to make them the same.
There is no need to make them the same. All you need from a lossless compressor is compression followed by decompression to give you exactly what you put in. That's it. You have that. There is never any assurance of the other direction, i.e., that you can get decompression followed by compression to give you the same thing.
Upvotes: 0