Reputation: 626
using Python 3.5
I am uncompressing a gzip file, writing to another file. After looking into an out of memory problem, I find an example in the docs for the gzip module:
import gzip
import shutil
with open('/home/joe/file.txt', 'rb') as f_in:
with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
This does compression, and I want uncompression, so I take it that I can just reverse the pattern, giving
with open(unzipped_file, 'wb') as f_out, gzip.open(zipped_file, 'rb') as f_in:
shutil.copyfileobj(f_in, f_out)
My question is, why did I get into memory trouble with the following:
with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
wout.write(zin.read())
Either I laid on the last straw, or I was naive in believing that the files would act like generators and stream the unzip process, taking very little memory. Should these two methods be equivalent?
Upvotes: 3
Views: 1189
Reputation: 626
Instead of the memory hungry (and naive)
import gzip
with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
wout.write(zin.read())
Based on the earlier answers I tested this:
import gzip
block_size = 64*1024
with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
while True:
uncompressed_block = zin.read(block_size)
if not uncompressed_block:
break
wout.write(uncompressed_block)
Verified on a 4.8G file.
Upvotes: 0
Reputation: 517
Here is the shutil.copyfileObj
method.
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
It reads the file in chunks of length of 16*1024. And when you are trying to reverse the process, you are not taking regard of the size of the file which will get read into memory and land you into memory problem.
Upvotes: 3