dafnahaktana
dafnahaktana

Reputation: 857

How to create zip file with data-descriptor section

Is there a way to create a zip file and to force it to have data-descriptor section from the command line?

Upvotes: 3

Views: 2838

Answers (1)

Rob W
Rob W

Reputation: 349042

In a comment on Github (https://github.com/adamhathcock/sharpcompress/issues/88#issuecomment-215696631), I found a suggestion to use the -fd flag:

Just FYI, when creating the ZIP file i also used the the command line parameter -fd which enforces usage of data descriptors. Not sure whether the ZIP tool on OSX provides this parameter, but i noticed that you didn't use it when creating your ZIP file

So I tested it (with the standard zip tool on OS X, "Zip 3.0 (July 5th 2008)"), and confirmed that it indeed generates a zip file with the data descriptor set, as follows:

/tmp> touch empty.txt
/tmp> zip -fd foo.zip empty.txt
  adding: empty.txt (stored 0%)
/tmp> xxd foo.zip
00000000: 504b 0304 0a00 0800 0000 698d 7c49 0000  PK........i.|I..
00000010: 0000 0000 0000 0000 0000 0900 1c00 656d  ..............em
00000020: 7074 792e 7478 7455 5409 0003 a65e 3c58  pty.txtUT....^<X
00000030: a65e 3c58 7578 0b00 0104 f501 0000 0400  .^<Xux..........
00000040: 0000 0050 4b07 0800 0000 0000 0000 0000  ...PK...........
00000050: 0000 0050 4b01 021e 030a 0008 0000 0069  ...PK..........i
00000060: 8d7c 4900 0000 0000 0000 0000 0000 0009  .|I.............
00000070: 0018 0000 0000 0000 0000 00b0 8100 0000  ................
00000080: 0065 6d70 7479 2e74 7874 5554 0500 03a6  .empty.txtUT....
00000090: 5e3c 5875 780b 0001 04f5 0100 0004 0000  ^<Xux...........
000000a0: 0000 504b 0506 0000 0000 0100 0100 4f00  ..PK..........O.
000000b0: 0000 5300 0000 0000                      ..S.....

The boldfaced sequence of 16 bytes above is the data descriptor section. Its header 50 4b07 08 (or PK..) and the data descriptor format is specified by the zip specification (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT):

  4.3.9  Data descriptor:

        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes

      4.3.9.1 This descriptor MUST exist if bit 3 of the general
      purpose bit flag is set (see below).  It is byte aligned
      and immediately follows the last byte of compressed data.
      This descriptor SHOULD be used only when it was not possible to
      seek in the output .ZIP file, e.g., when the output .ZIP file
      was standard output or a non-seekable device.  For ZIP64(tm) format
      archives, the compressed and uncompressed sizes are 8 bytes each.

...
      4.3.9.3 Although not originally assigned a signature, the value 
      0x08074b50 has commonly been adopted as a signature value 
      for the data descriptor record.  Implementers should be 
      aware that ZIP files may be encountered with or without this 
      signature marking data descriptors and SHOULD account for
      either case when reading ZIP files to ensure compatibility.

To find out whether the third bit of the general purpose bit flag is set, we have to parse the zip file to locate the file header for empty.txt.

See Wikipedia for a brief overview and tables describing the meaning of bytes in a zip file - https://en.wikipedia.org/wiki/Zip_(file_format) .
The last 22 bytes (starting at the penultimate line, 504b 0506 (or PK..) is the end of central directory (EOCD) record. At offset 16 within this EOCD record, a 4-byte unsigned integer specifies the start of the central directory. We have 5300 0000 (little endian), or 0x53 = 83. This happens to be the offset right after the data descriptor section that we have identified above. Starting at the 6th offset after the start of the central directory, we find a pair of bytes that form the bit flag.

0a 00 (little endian) = 00000000 00001010 (binary, big endian)
                                     ^
                            bit 3 of the general purpose flag

Indeed, the third bit (counting form the right, starting at 0) is set, so we see that the zip file created above indeed has a data descriptor section.

Upvotes: 4

Related Questions