Reputation: 1005
This is on a Mac if it matters. zip is version 3.0 and unzip is version 6.0 (I expect what is shipped with the OS).
If I do the following:
Start with a generic 'pptx' file, unzip it into a directory, clean up the XML, then zip it up
unzip V1.pptx -d dir
cd dir
find . -name "*.xml" -type f -exec xmllint --output '{}' --format '{}' \;
zip -0 ../V1Orig.pptx -r *
I now have a new zip file V1Orig.pptx
unzip V1Orig.pptx -d copy
cd copy
find . -name "*.xml" -type f -exec xmllint --output '{}' --format '{}' \;
zip -0 ../V1Copy.pptx -r *
If I now 'diff' the orig and copy directories, they are the same:
Common subdirectories: orig/_rels and copy/_rels
Common subdirectories: orig/docProps and copy/docProps
Common subdirectories: orig/ppt and copy/ppt
But if I diff the pptx files or do an md5 checksum on the pptx I get a different answer.
diff V1Orig.pptx V1Copy.pptx
Binary files V1Orig.pptx and V1Copy.pptx differ
ls -rtla orig
total 8
drwxr-xr-x 11 fultonm wheel 352 10 Jan 16:49 ppt
drwxr-xr-x 5 fultonm wheel 160 10 Jan 16:49 docProps
drwxr-xr-x 3 fultonm wheel 96 10 Jan 16:49 _rels
drwxr-xr-x 6 fultonm wheel 192 14 Jan 10:40 .
-rw-r--r-- 1 fultonm wheel 3212 14 Jan 10:42 [Content_Types].xml
drwxr-xr-x 8 fultonm wheel 256 14 Jan 10:57 ..
fultonm@mikes-MacBook-Pro-2 /tmp/handzip>ls -rtla copy
total 8
drwxr-xr-x 5 fultonm wheel 160 14 Jan 10:42 docProps
drwxr-xr-x 3 fultonm wheel 96 14 Jan 10:42 _rels
drwxr-xr-x 6 fultonm wheel 192 14 Jan 10:42 .
drwxr-xr-x 11 fultonm wheel 352 14 Jan 10:42 ppt
-rw-r--r-- 1 fultonm wheel 3212 14 Jan 10:42 [Content_Types].xml
drwxr-xr-x 8 fultonm wheel 256 14 Jan 10:57 ..
Upvotes: 0
Views: 44
Reputation: 112394
You can get them to be the same by making the timestamps of all of the files and directories to be the same, and by using the -X
option to not save extra file attribute information.
So for each zip
command, use -rX
, and in the copy directory do:
find . -exec touch -r ../dir/{} {} \;
before the zip.
Why it should matter that the zip files be identical, I have no idea. What matters is that they both decompress to the same thing.
Upvotes: 1
Reputation: 1005
I believe the problem is that the time stamps are being recorded. These of course will be different because the xmllint process changes the times.
If I do unzip -l
I see the order of files being put into the zip file is stable since it looks to be sorted by name and not time, but the date and time stamp are being recorded and of course, those are different.
The 'fix' is likely to ensure the time stamps are not updated by any of the unzip/xmllint steps so that when it is zipped up again, it has the original time stamps.
Better answers appreciated!
Upvotes: 0