cclloyd
cclloyd

Reputation: 9195

Compresing large files with gzip in python

I searched how to compress a file in python, and found an answer that was basically as described below:

with open(input_file, 'rb') as f_in, gzip.open(output_file, 'wb') as f_out:
    f_out.write(f_in.read())

It works readily with a 1GB file. But I plan on compressing files up to 200 GB.

Are there any considerations I need to take into account? Is there a different way I should be doing it with large files like that?

The files are binary .img files (exports of a block device; usually with empty space at the end, thus the compression works wonderfully).

Upvotes: 3

Views: 3309

Answers (1)

ti7
ti7

Reputation: 18792

This will read the entire file into memory, causing problems for you if you don't have 200G available!

You may be able to simply pipe the file through gzip, avoiding Python which will handle doing the work in chunks

% gzip -c myfile.img > myfile.img.gz

Otherwise you should read the file in chunks (picking a large block size may provide some benefit)

BLOCK_SIZE = 8192

with open(myfile, "rb") as f_in, gzip.open(output_file, 'wb') as f_out:
    while True:
        content = f_in.read(BLOCK_SIZE)
        if not content:
            break
        f_out.write(content)

Upvotes: 3

Related Questions