Reputation: 6824
I want to zip a stream and stream out the result. I'm doing it using AWS Lambda which matters in sense of available disk space and other restrictions.
I'm going to use the zipped stream to write an AWS S3 object using upload_fileobj()
or put()
, if it matters.
I can create an archive as a file until I have small objects:
import zipfile
zf = zipfile.ZipFile("/tmp/byte.zip", "w")
zf.writestr(filename, my_stream.read())
zf.close()
For large amount of data I can create an object instead of file:
from io import BytesIO
...
byte = BytesIO()
zf = zipfile.ZipFile(byte, "w")
....
but how can I pass the zipped stream to the output? If I use zf.close()
- the stream will be closed, if I don't use it - the archive will be incomplete.
Upvotes: 8
Views: 16662
Reputation: 27022
Instead of using Python't built-in zipfile, you can use stream-zip (full disclosure: written by me)
If you have an iterable of bytes, my_data_iter
say, you can get an iterable of a zip file using its stream_zip
function:
from datetime import datetime
from stream_zip import stream_zip, ZIP_64
def files():
modified_at = datetime.now()
perms = 0o600
yield 'my-file-1.txt', modified_at, perms, ZIP_64, my_data_iter
my_zip_iter = stream_zip(files())
If you need a file-like object of the zipped bytes, say to pass to boto3's upload_fileobj, you can convert from the iterable with a transformation function, like the one from to-file-like-obj (also written by me)
import boto3
from to_file_like_obj import to_file_like_obj
# Convert iterable to file-like object
my_file_like_obj = to_file_like_obj(my_zip_iter)
# Upload to S3 (likely using a multipart upload)
s3 = boto3.client('s3')
s3.upload_fileobj(my_file_like_obj, 'my-bucket', 'my.zip')
Upvotes: 15
Reputation: 12255
You might like to try the zipstream version of zipfile. For example, to compress stdin to stdout as a zip file holding the data as a file named TheLogFile
using iterators:
#!/usr/bin/python3
import sys, zipstream
with zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED) as z:
z.write_iter('TheLogFile', sys.stdin.buffer)
for chunk in z:
sys.stdout.buffer.write(chunk)
Upvotes: 4