AmOs
AmOs

Reputation: 149

Compressing and Decompressing same files produces different size

Here is exactly what happened .. i received a compressed zip file from a friend of mine this file has the following meta information

1518852 Nov 19 15:10 friend.zip  
32e44a2d0283d81629dcf749fc3ced83c47efd7b firend.zip

then i unziped this file , not touching it's contenting not even reading it ! and i zip it again and it produces this

1519608 Nov 19 15:31 mine.zip
0aaea14e59971c40ba1de04558e44b211ac4c628  mine.zip

i tried this on linux , windows , mac .. on different machines not virtual machines and on different architectures , amd and intel i used my laptop , pc and minimac ! and they all produce the same values ! same different in package size 756 bytes ! same sha1 sum .. this is driving me insane ! i did not touch the files ! AT ALL not even cat file.txt on it ! nothing !

these are the contents of the file app code.txt config.xml .DS_Store images index.html .settings widget.info

it's a javascript application ! nothing is compiled just pure text .. only this file .DS_Store is a binary file which i don't know what it represents !

i should mention that this file during compression and decompression on windows i opened both files while they are compressed life and every thing produces the same CRC ..

only one thing is different between files and it is a field called PACKED .DS_Store seems to have a value of 15 on the original zip file and a value of 13 on my file !

what is this ? how could this happen ?

.zip file , could they be signed ? i mean if they were signed by some special paramter would that make a difference in compression and decompression ?

Upvotes: 0

Views: 5206

Answers (2)

Martin Geisse
Martin Geisse

Reputation: 1471

Most real-world compression algorithms do not deterministically compress to a specific size, unless you make sure that all the parameters to the algorithm and all implementation details are exactly the same. Note that this may include hidden parameters which you cannot set as the user of a program.

To clarify what I mean by "hidden parameters": Imagine the compression algorithm like a program function. Lots of variables must be set to initial values. For some of them, more than one value makes sense, depending on the expected input, compression level, ... Even the compression "level" is a vague thing -- the user expects to specify a number between, say, 1 and 9 -- but internally there's lots of switches that must be set accordingly, and there's a certain degree of freedom how that "level" is mapped to the actual initialization values. One programmer who implements the algorithm might do things a little bit different than another one, because both are considered "correct" in the sense that you can compress and decompress with either program; they just don't produce the exact same output size.

As to the problem of signing the zip file: Could you describe in detail what you are trying to accomplish? It sounds a bit like you want to ensure the integrity of the file... but I suspect you actually want to ensure integrity of the contents of the zipfile. And there's your answer: Generate a "table of contents", then generate a signature of the contents including the ToC and add that. (Whether the ToC includes itself and/or the signature is irrelevant, just do it the same way on all systems)

That way, the signature makes sure no file was altered, including the ToC, and the ToC makes sure no file was added or removed.

Upvotes: 3

Danstahr
Danstahr

Reputation: 4319

The output depends on the compression algorithm settings. In the archiver, you can usually set many parameters like compression level, amount of resources to compress/decompress the file etc. See the specification for details.

Upvotes: 0

Related Questions