code base 5000
code base 5000

Reputation: 4102

downloading a large file in chunks with gzip encoding (Python 3.4)

If I make a request for a file and specify encoding of gzip, how do I handle that?

Normally when I have a large file I do the following:

while True:
   chunk = resp.read(CHUNK)
   if not chunk: break
   writer.write(chunk)
   writer.flush()

where the CHUNK is some size in bytes, writer is an open() object and resp is the request response generated from a urllib request.

So it's pretty simple most of the time when the response header contains 'gzip' as the returned encoding, I would do the following:

decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
data = decomp.decompress(resp.read())
writer.write(data)
writer.flush()

or this:

f = gzip.GzipFile(fileobj=buf)
writer.write(f.read())

where the buf is a BytesIO().

If I try to decompress the gzip response though, I am getting issues:

while True:
   chunk = resp.read(CHUNK)
   if not chunk: break
   decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
   data = decomp.decompress(chunk)
   writer.write(data)
   writer.flush()

Is there a way I can decompress the gzip data as it comes down in little chunks? or do I need to write the whole file to disk, decompress it then move it to the final file name? Part of the issue I have, using 32-bit Python, is that I can get out of memory errors.

Thank you

Upvotes: 3

Views: 1729

Answers (1)

code base 5000
code base 5000

Reputation: 4102

I think I found a solution that I wish to share.

def _chunk(response, size=4096):
     """ downloads a web response in pieces """
    method = response.headers.get("content-encoding")
    if method == "gzip":
        d = zlib.decompressobj(16+zlib.MAX_WBITS)
        b = response.read(size)
        while b:
            data = d.decompress(b)
            yield data
            b = response.read(size)
            del data
    else:
        while True:
            chunk = response.read(size)
            if not chunk: break
            yield chunk

If anyone has a better solution, please add to it. Basically my error was the creation of the zlib.decompressobj(). I was creating it in the wrong place.

This seems to work in both python 2 and 3 as well, so there is a plus.

Upvotes: 3

Related Questions