mohawkTrail
mohawkTrail

Reputation: 626

memory efficient way to write an uncompressed file from a gzip file

using Python 3.5

I am uncompressing a gzip file, writing to another file. After looking into an out of memory problem, I find an example in the docs for the gzip module:

import gzip
import shutil
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

This does compression, and I want uncompression, so I take it that I can just reverse the pattern, giving

with open(unzipped_file, 'wb') as f_out, gzip.open(zipped_file, 'rb') as f_in:
    shutil.copyfileobj(f_in, f_out)

My question is, why did I get into memory trouble with the following:

with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
    wout.write(zin.read())

Either I laid on the last straw, or I was naive in believing that the files would act like generators and stream the unzip process, taking very little memory. Should these two methods be equivalent?

Upvotes: 3

Views: 1189

Answers (2)

mohawkTrail
mohawkTrail

Reputation: 626

Instead of the memory hungry (and naive)

import gzip
with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
     wout.write(zin.read())

Based on the earlier answers I tested this:

import gzip
block_size = 64*1024
with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
while True:
    uncompressed_block = zin.read(block_size)
    if not uncompressed_block:
        break
    wout.write(uncompressed_block)

Verified on a 4.8G file.

Upvotes: 0

Vinit Kumar
Vinit Kumar

Reputation: 517

Here is the shutil.copyfileObj method.

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

It reads the file in chunks of length of 16*1024. And when you are trying to reverse the process, you are not taking regard of the size of the file which will get read into memory and land you into memory problem.

Upvotes: 3

Related Questions