Christopher King
Christopher King

Reputation: 1062

Node zlib incremental inflate

I've located the end of a local file header in download stream of a large zip file that

and would like to now inflate that data using Node zlib but I cannot figure out how to feed data into zlib and receive feedback telling me when the deflate stream has self terminated.

Does Node's zlib library support consuming chunks of deflate data and returning a result letting the caller know when the deflate stream has ended?

Or is this an insane thing to do because it would imply I'm inflating on the UI thread and what I should really do is save the downloaded file and once downloaded use an NPM package? Hm.. well.. either the network is faster than inflation in which case streaming inflation would slow the network (bummer) or the network is slower than streaming inflation so why deflate while streaming (which I can't figure out how to do anyway) when I could simply saving to disk and reload-deflate while I'm sitting around waiting for the network..

Still, for my edification, I'd still like to know if Node supports streaming inflation.

var zlib = require('zlib')
var data = bufferOfChunkOfDeflatedData
var inflate = zlib.createInflate();
var stream = inflate.pipe(fs.createWriteStream(path));
var result = stream.write(data);
// but result doesn't indicate if the inflate stream has terminated...

Describes deflate headers and how they encode the length of the stream: https://www.bolet.org/~pornin/deflate-flush-fr.html


In memory stream: https://www.npmjs.com/package/memory-streams


Well, this guy just pulls till he hits the magic signature! :) https://github.com/EvanOxfeld/node-unzip/blob/5a62ecbcef6523708bb8b37decaf6e41728ac7fc/lib/parse.js#L152


Node code for configuring convenience method: https://github.com/nodejs/node/blob/6e56771f2a9707ddf769358a4338224296a6b5fe/lib/zlib.js#L83 Specifically: https://nodejs.org/api/zlib.html#zlib_zlib_inflateraw_buffer_options_callback


Eh, looks like node is setup to return the decompressed buffer as one block to the callback; Doesn't look like node is setup to figure out the end of the deflate stream.

https://nodejs.org/api/stream.html#stream_transform_transform_chunk_encoding_callback says The callback function must be called only when the current chunk is completely consumed. and here's the spot where it passes the chunk to zlib https://github.com/nodejs/node/blob/6e56771f2a9707ddf769358a4338224296a6b5fe/lib/zlib.js#L358. So there's no opportunity to say the stream was partially consumed..


But then again... https://github.com/ZJONSSON/node-unzipper/blob/affbf89b54b121e85dcd31adf7b1dfde58afebb7/lib/parse.js#L161 but not really. Also just checks for the magic sig: https://github.com/ZJONSSON/node-unzipper/blob/affbf89b54b121e85dcd31adf7b1dfde58afebb7/lib/parse.js#L153


And from the zip spec:

4.3.9.3 Although not originally assigned a signature, the value 0x08074b50 has commonly been adopted as a signature value for the data descriptor record. Implementers SHOULD be aware that ZIP files MAY be encountered with or without this signature marking data descriptors and SHOULD account for either case when reading ZIP files to ensure compatibility.

So looks like everyone just looks for the sig.


Mark says that's a no-no... So don't do that. And know that if your using an NPM lib to unzip, then there's a good chance the lib is doing that. To do it right would require, I think, grocking this from the zlib API docs: https://zlib.net/manual.html

The Z_BLOCK option assists in appending to or combining deflate streams. To assist in this, on return inflate() always sets strm->data_type to the number of unused bits in the last byte taken from strm->next_in, plus 64 if inflate() is currently decoding the last block in the deflate stream, plus 128 if inflate() returned immediately after decoding an end-of-block code or decoding the complete header up to just before the first byte of the deflate stream. The end-of-block will not be indicated until all of the uncompressed data from that block has been written to strm->next_out. The number of unused bits may in general be greater than seven, except when bit 7 of data_type is set, in which case the number of unused bits will be less than eight. data_type is set as noted here every time inflate() returns for all flush options, and so can be used to determine the amount of currently consumed input in bits.

This seems to indicate the final compressed bit will not be byte aligned. Yet the ZIP spec seems to indicate header that starts with the magic sig, the one everyone is using but shouldn't, is byte aligned: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

4.3.9.1 This descriptor MUST exist if bit 3 of the general purpose bit flag is set (see below). It is byte aligned and immediately follows the last byte of compressed data. This descriptor SHOULD be used only when it was not possible to seek in the output .ZIP file, e.g., when the output .ZIP file was standard output or a non-seekable device. For ZIP64(tm) format archives, the compressed and uncompressed sizes are 8 bytes each.

How can the end of the deflate stream not be byte aligned but the following Data descriptor be byte aligned?

Is there a nice reference implementation?


Reference impl using Inflate with Z_BLOCK: https://github.com/madler/zlib/blob/master/examples/gzappend.c


This guys reads backwards to pull out the directory: https://github.com/antelle/node-stream-zip/blob/907c8876e8aeed6c33a668bbd06a0f79e7a022ef/node_stream_zip.js#L180 Is this necessary?

This guy seems to think that zips cannot be inflated without reading the whole file to get to the directory: https://www.npmjs.com/package/yauzl#no-streaming-unzip-api

I don't see why that would be the case. The streams describe their length... and Mark verifies they can be streamed.


And here is where Node.js checks for Z_STREAM_END!

Upvotes: 0

Views: 1596

Answers (1)

Mark Adler
Mark Adler

Reputation: 112502

It looks like it does, since the documentation lists zlib.constants.Z_STREAM_END as a possible return value.

Upvotes: 0

Related Questions