Create a zip file on S3 from CSV files on S3 using Lambda

Question

Around 60 CSV files being generated daily in my S3 bucket. The average size of each file is around 500MB. I want to zip all these files through lambda function on the fly(without downloading a file inside Lambda execution) and upload these zipped files to another s3 bucket. I came across these solutions 1 and 2 but I am still getting issue in the implementation. Right now, I am trying to stream CSV file data into a zipped file(this zip file is being created in Lambda tmp directory) and then uploading on s3. But I am getting this error message while writing into zip file: [Errno 36] File name too long

This is my test Lambda function where I am just trying with one file but in actual case I need to zip 50-60 CSV files individually:

import boto3
import zipfile


def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    iterator = s3.Object('bucket-name', 'file-name').get()['Body'].iter_lines()
    my_zip = zipfile.ZipFile('/tmp/test.zip', 'w')
    for line in iterator:
        my_zip.write(line)
    
    s3_resource.meta.client.upload_fileobj(file-name, "another-bucket-name", "object-name")

Also, is there a way where I can stream data from my CSV file, zip it and upload it to another s3 bucket without actually saving a full zip file on Lambda memory?

Raman Balyan · Accepted Answer

After lot of research and trials, I am able to make it work. I used smart_open library for my issue and managed to zip 550MB file with just 150MB memory usage in my Lambda. To use external library, I had to use Layers in Lambda. Here is my code:

from smart_open import open, register_compressor
import lzma, os


def lambda_handler(event, context):
    with open('s3://bucket-name-where-large-file/file-key-name') as fin:
        with open('s3://bucket-name-to-put-zip-file/zip-file-key-name', 'w') as fout:
            for line in fin:
                fout.write(line)

Please note, smart_open supports .gz and .bz2 file compression. If you want to zip file in other formats, you can create your own compressor using register_compressor method of this library.

Create a zip file on S3 from CSV files on S3 using Lambda

Answers (1)

Related Questions