Putnik
Putnik

Reputation: 6824

How to stream from ZipFile? How to zip "on the fly"?

I want to zip a stream and stream out the result. I'm doing it using AWS Lambda which matters in sense of available disk space and other restrictions. I'm going to use the zipped stream to write an AWS S3 object using upload_fileobj() or put(), if it matters.

I can create an archive as a file until I have small objects:

import zipfile
zf = zipfile.ZipFile("/tmp/byte.zip", "w")
zf.writestr(filename, my_stream.read())
zf.close()

For large amount of data I can create an object instead of file:

from io import BytesIO
...
byte = BytesIO()
zf = zipfile.ZipFile(byte, "w")
....

but how can I pass the zipped stream to the output? If I use zf.close() - the stream will be closed, if I don't use it - the archive will be incomplete.

Upvotes: 8

Views: 16662

Answers (2)

Michal Charemza
Michal Charemza

Reputation: 27022

Instead of using Python't built-in zipfile, you can use stream-zip (full disclosure: written by me)

If you have an iterable of bytes, my_data_iter say, you can get an iterable of a zip file using its stream_zip function:

from datetime import datetime
from stream_zip import stream_zip, ZIP_64

def files():
    modified_at = datetime.now()
    perms = 0o600
    yield 'my-file-1.txt', modified_at, perms, ZIP_64, my_data_iter

my_zip_iter = stream_zip(files())

If you need a file-like object of the zipped bytes, say to pass to boto3's upload_fileobj, you can convert from the iterable with a transformation function, like the one from to-file-like-obj (also written by me)

import boto3
from to_file_like_obj import to_file_like_obj

# Convert iterable to file-like object
my_file_like_obj = to_file_like_obj(my_zip_iter)

# Upload to S3 (likely using a multipart upload)
s3 = boto3.client('s3')
s3.upload_fileobj(my_file_like_obj, 'my-bucket', 'my.zip')

Upvotes: 15

meuh
meuh

Reputation: 12255

You might like to try the zipstream version of zipfile. For example, to compress stdin to stdout as a zip file holding the data as a file named TheLogFile using iterators:

#!/usr/bin/python3
import sys, zipstream
with zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED) as z:
    z.write_iter('TheLogFile', sys.stdin.buffer)
    for chunk in z:
        sys.stdout.buffer.write(chunk)

Upvotes: 4

Related Questions