Reputation: 1063
Around 60 CSV files being generated daily in my S3 bucket. The average size of each file is around 500MB. I want to zip all these files through lambda function on the fly(without downloading a file inside Lambda execution) and upload these zipped files to another s3 bucket. I came across these solutions 1 and 2 but I am still getting issue in the implementation. Right now, I am trying to stream CSV file data into a zipped file(this zip file is being created in Lambda tmp directory) and then uploading on s3. But I am getting this error message while writing into zip file:
[Errno 36] File name too long
This is my test Lambda function where I am just trying with one file but in actual case I need to zip 50-60 CSV files individually:
import boto3
import zipfile
def lambda_handler(event, context):
s3 = boto3.resource('s3')
iterator = s3.Object('bucket-name', 'file-name').get()['Body'].iter_lines()
my_zip = zipfile.ZipFile('/tmp/test.zip', 'w')
for line in iterator:
my_zip.write(line)
s3_resource.meta.client.upload_fileobj(file-name, "another-bucket-name", "object-name")
Also, is there a way where I can stream data from my CSV file, zip it and upload it to another s3 bucket without actually saving a full zip file on Lambda memory?
Upvotes: 0
Views: 3361
Reputation: 1063
After lot of research and trials, I am able to make it work. I used smart_open library for my issue and managed to zip 550MB file with just 150MB memory usage in my Lambda. To use external library, I had to use Layers in Lambda. Here is my code:
from smart_open import open, register_compressor
import lzma, os
def lambda_handler(event, context):
with open('s3://bucket-name-where-large-file/file-key-name') as fin:
with open('s3://bucket-name-to-put-zip-file/zip-file-key-name', 'w') as fout:
for line in fin:
fout.write(line)
Please note, smart_open supports .gz
and .bz2
file compression. If you want to zip file in other formats, you can create your own compressor using register_compressor
method of this library.
Upvotes: 1