Glen Low
Glen Low

Reputation: 4497

Can zip files be sparse/non-contiguous?

The zip file format ends with a central directory section that then points to the individual zip entries within the file. This appears to allow zip entries to occur anywhere within the zip file itself. Indeed, self-extracting zip files are a good example: they start with an executable and all the zip entries occur after the executable bytes.

The question is: does the zip file format really allow sparse or non-contiguous zip entries? e.g. if there are empty or otherwise unaccounted bytes between zip entries? Both the definitive PK note and wikipedia article seem to allow this. Will all/most typical zip utilities work with such sparse zip files?

The use case is this: I want to be able to delete or replace zip entries in a zip file. To do this, the typical minizip etc. libraries want you to copy out the entire zip file while not copying out the deleted or replaced zip entry, which seems wasteful and slow.

Wouldn't it be better to over-allocate, say 1.5x the storage for an entry, then when deleting or replacing an entry you could figure out where the unallocated bytes were and use those directly? Using 1.5x the storage means that if the zip entry grew linearly, the reallocations should also happen amortized linearly. It would be similar to file system block allocation though probably not as sophisticated.

This also helps with a lot of the zip-based file formats out there. Instead of having to have some temp directory somewhere (or even in memory) with the temporarily unzipped files for editing/changing and then having to rezip the lot back into the file format, this would lessen the need for rezipping and rewriting portions of the zip file.

Are there any C/C++ libraries out there that do this?

Upvotes: 8

Views: 1271

Answers (1)

Mark Adler
Mark Adler

Reputation: 112284

No. Reading the central directory is optional. zip decoders can, and some do, simply read the zip file sequentially from the beginning, expecting to see the local headers and entry data contiguously. They can complete the job of decoding, never having even looked at the central directory.

In order to do what you want, you would need to put in dummy zip entries between the useful entries in order to hold that space. At least if you want to be compatible with the rest of the zip world.

Upvotes: 7

Related Questions