memory efficient way to write an uncompressed file from a gzip file

Question

using Python 3.5

I am uncompressing a gzip file, writing to another file. After looking into an out of memory problem, I find an example in the docs for the gzip module:

import gzip
import shutil
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

This does compression, and I want uncompression, so I take it that I can just reverse the pattern, giving

with open(unzipped_file, 'wb') as f_out, gzip.open(zipped_file, 'rb') as f_in:
    shutil.copyfileobj(f_in, f_out)

My question is, why did I get into memory trouble with the following:

with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
    wout.write(zin.read())

Either I laid on the last straw, or I was naive in believing that the files would act like generators and stream the unzip process, taking very little memory. Should these two methods be equivalent?

Vinit Kumar · Accepted Answer

Here is the shutil.copyfileObj method.

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

It reads the file in chunks of length of 16*1024. And when you are trying to reverse the process, you are not taking regard of the size of the file which will get read into memory and land you into memory problem.

memory efficient way to write an uncompressed file from a gzip file

Answers (2)

Related Questions