Reputation: 117
I have a directory structure that I need to write into a zip file that contains a single file that is duplicated in 50 sub directories. When users download the zip file, the duplicated file needs to appear in every directory. Is there a way to store the file once in a zip file, yet have it downloaded into the subdirectories when it is extracted? I cannot use shortcuts.
It would seem like Zip would be smart enough to recognize that I have 50 duplicate files and automatically store the file once... It would be silly to make this file 50 times larger than necessary!
Upvotes: 3
Views: 966
Reputation: 117
I just wanted to clarify that the Suffit solution only removes duplicate files when compressing to their own proprietary format and not ZIP.
Upvotes: 1
Reputation: 5638
It is possible within the ZIP specification to have multiple entries in the central directory point to the same local header offset. The ZIP application would have to precalculate the CRC of the file it was going to add and find a matching entry in the central directory of the existing ZIP file. A query for the CRC lookup against a ZIP file that contains a huge number of entries would be an expensive operation. It would also be costly to precalculate the CRC on huge files (CRC calculations are usually done during the compression routine).
I have not heard of a specific ZIP application that makes this optimization. However, it does look like StuffIt X format supports duplicate file optimization:
The StuffIt X format supports "Duplicate Detection". When adding files to an archive, StuffIt detects if there are duplicate items (even if they have the different file names), and only compresses the duplicates once, no matter how many copies there are. When expanded, StuffIt recreates all the duplicates from that one instance. Depending on the data being compressed, it can offer significant reductions in size and compression time.
Upvotes: 2