gmletzkojr
gmletzkojr

Reputation: 405

Uncompress a gzip File in Memory Using zlib Version 1.1.3

I have a gzip file that is in memory, and I would like to uncompress it using zlib, version 1.1.3. Uncompress() is returning -3, Z_DATA_ERROR, indicating the source data is corrupt.

I know that my in memory buffer is correct - if I write the buffer out to a file, it is the same as my source gzip file.

The gzip file format indicates that there is a 10 byte header, optional headers, the data, and a footer. Is it possible to determine where the data starts, and strip that portion out? I performed a search on this topic, and a couple people have suggested using inflateInit2(). However, in my version of zlib, that function is oddly commented out. Is there any other options?

Upvotes: 2

Views: 3632

Answers (2)

Dirk Thannhäuser
Dirk Thannhäuser

Reputation: 283

I came across the same problem, other zlib version (1.2.7)
I don't know why inflateInit2() is commented out.

Without calling inflateInit2 you can do the following:

err = inflateInit(&d_stream);
err = inflateReset2(&d_stream, 31);

the inflateReset2 is also called by inflateInit. Inside of inflateInit the WindowBits are set to 15 (1111 binary). But you have to set them to 31 (11111) to get gzip working.

The reason is here:

inside of inflateReset2 the following is done:

wrap = (windowBits >> 4) + 1;

which leads to 1 if window bits are set 15 (1111 binary) and to 2 if window bits are set 31 (11111)

Now if you call inflate() the following line in the HEAD state checks the state->wrap value along with the magic number for gzip

if ((state->wrap & 2) && hold == 0x8b1f) {  /* gzip header */

So with the following code I was able to do in-memory gzip decompression: (Note: this code presumes that the complete data to be decompressed is in memory and that the buffer for decompressed data is large enough)

int err;
z_stream d_stream; // decompression stream



d_stream.zalloc = (alloc_func)0;
d_stream.zfree = (free_func)0;
d_stream.opaque = (voidpf)0;

d_stream.next_in  = deflated; // where deflated is a pointer the the compressed data buffer
d_stream.avail_in = deflatedLen; // where deflatedLen is the length of the compressed data
d_stream.next_out = inflated; // where inflated is a pointer to the resulting uncompressed data buffer
d_stream.avail_out = inflatedLen; // where inflatedLen is the size of the uncompressed data buffer

err = inflateInit(&d_stream);
err = inflateReset2(&d_stream, 31);
err = inflateEnd(&d_stream);

Just commenting in inflateInit2() is the oder solution. Here you can set WindowBits directly

Upvotes: 1

Alex Reynolds
Alex Reynolds

Reputation: 96927

Is it possible to determine where the data starts, and strip that portion out?

Gzip has the following magic number:

static const unsigned char gzipMagicBytes[] = { 0x1f, 0x8b, 0x08, 0x00 };

You can read through a file stream and look for these bytes:

static const int testElemSize = sizeof(unsigned char);
static const int testElemCount = sizeof(gzipMagicBytes);

const char *fn = "foo.bar";
FILE *fp = fopen(fn, "rbR");
char testMagicBuffer[testElemCount] = {0};
unsigned long long testMagicOffset = 0ULL;

if (fp != NULL) {
    do {
        if (memcmp(testMagicBuffer, gzipMagicBytes, sizeof(gzipMagicBytes)) == 0) {
            /* we found gzip magic bytes, do stuff here... */
            fprintf(stdout, "gzip stream found at byte offset: %llu\n", testMagicOffset);
            break;
        }
        testMagicOffset += testElemSize * testElemCount;
        fseek(fp, testMagicOffset - testElemCount + 1, SEEK_SET);
        testMagicOffset -= testElemCount + 1;
    } while (fread(testMagicBuffer, testElemSize, testElemCount, fp));
}

fclose(fp);

Once you have the offset, you could do copy and paste operations, or overwrite other bytes, etc.

Upvotes: 0

Related Questions