SyntaxT3rr0r
SyntaxT3rr0r

Reputation: 28313

Java/zip: Why are .jar files non-deterministically created?

I never really looked into it but now I realized that I can't easily build two identical .jar files.

I mean, if I build twice, without changing anything, I get the exact same size but different checksums for the .jar.

So I quickly ran some test (basically unzipping, sort -n -k 5'ing and then diff'ing) to see that all the files inside the .jar were identical, yet the .jar were different.

So I did a test with a plain .zip file and found this:

... $ zip 1.zip a.txt
... $ zip 2.zip a.txt
... $ ls -l ?.zip
-rw-rw-r-- 1 webinator webinator 147 2010-07-21 13:09 1.zip
-rw-rw-r-- 1 webinator webinator 147 2010-07-21 13:09 2.zip

(exact same .zip file size)

... $ sha1sum ?.zip
db99f6ad5733c25c0ef1695ac3ca3baf5d5245cf  1.zip
eaf9f0f92eb2ac3e6ac33b44ef45b170f7984a91  2.zip

(different SHA-1 sums, let see why)

$ hexdump 1.zip -C > 1.txt

$ hexdump 2.zip -C > 2.txt

$ diff 1.txt 2.txt 
3c3
< 00000020  74 78 74 55 54 09 00 03  ab d4 46 4c*4e*d5 46 4c  |txtUT.....FLN.FL|
---
> 00000020  74 78 74 55 54 09 00 03  ab d4 46 4c*5d*d5 46 4c  |txtUT.....FL].FL|

Unzipping both zip files surely gives back our unique file.

Question: why is that? (I'll answer myself)

Upvotes: 12

Views: 875

Answers (1)

SyntaxT3rr0r
SyntaxT3rr0r

Reputation: 28313

(Answering to myself) It is because the .zip file format saves the creation and modification time in its headers.

If you really do want to create two identical .zip (or .jar), you have to make the second one believe it was created/modified exactly at the same time as the first one.

Upvotes: 6

Related Questions