llanato
llanato

Reputation: 2491

Read large file in chunks, compress and write in chunks

I've come up against an issue due to large file sizes and processing them, the files are gradually increasing in size and will continue to do into the future. I can only use deflate as a compression option due to limitations on the 3rd party application I upload the compressed file to.

There is limited memory on the server running the script, so the usual issues with memory occur, hence why I'm trying to read in chunks and write in chunks with the output being the required deflated file.

Up to this point I've been using this snippet to compress the files to reduce the size and it's been working fine till now when the files are two big to process/compress.

with open(file_path_partial, 'rb') as file_upload, open(file_path, 'wb') as file_compressed:
    file_compressed.write(zlib.compress(file_upload.read()))

Some of the different options I've tried to get around it, all of which have failed to work properly so far.

1)

with open(file_path_partial, 'rb') as file_upload:
    with open(file_path, 'wb') as file_compressed:
        with gzip.GzipFile(file_path_partial, 'wb', fileobj=file_compressed) as file_compressed:
            shutil.copyfileobj(file_upload, file_compressed)

2)

BLOCK_SIZE = 64

compressor = zlib.compressobj(1)

filename = file_path_partial

with open(filename, 'rb') as input:
    with open(file_path, 'wb') as file_compressed:
        while True:            
            block = input.read(BLOCK_SIZE)
            if not block:
                break
            file_compressed.write(compressor.compress(block))

Upvotes: 2

Views: 2634

Answers (1)

gelonida
gelonida

Reputation: 5630

below example reads in 64k chunks, modifies each block and writes it out to a gzip file.

Is this what you want?

import gzip

with open("test.txt", "rb") as fin, gzip.GzipFile("modified.txt.gz", "w") as fout:
    while True:
        block = fin.read(65536) # read in 64k blocks
        if not block:
            break
        # comment next line to just write through
        block = block.replace(b"a", b"A")
        fout.write(block)

Upvotes: 2

Related Questions