sam Adams
sam Adams

Reputation: 41

Zipping files in s3 using AWS Lambda (Python)

I have a few hundred of PDFs in a s3 Bucket and I want a lambda function that creates a zip file for all my PDFs.

Doing this on my local Python is obviously easy enough and I had assumed the logic would transfer over to AWS Lambda in a pretty straight forward way. But so far I haven't managed to get this working.

I have been using the zipfile Python library, as well as boto3. My logic is as simple as finding all the files, appending them to a list of 'files_to_zip' and then iterating through that list writing each one to the new zip file.

This however has kicked up a number of issues and I think this is due to my short falls in understanding how calling and loading files works in Lambda.

Here is the code I have tried so far

    import os
    import boto3
    from io import BytesIO, StringIO
    from zipfile import ZipFile, ZIP_DEFLATED

    def zipping_files(event, context):
        s3 = boto3.resource('s3')

        BUCKET = 'BUCKET NAME'
        PREFIX_1 = 'KEY NAME'
        new_zip = r'NEW KEY NAME' 
        s3_client = boto3.client('s3')
        files_to_zip = []
        response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX_1)

        all = response['Contents']     
        for i in all:
            files_to_zip.append(str(i['Key']))



        with ZipFile(new_zip, 'w',  compression=ZIP_DEFLATED, allowZip64=True) as new_zip:
            for file in files_to_zip:
                new_zip.write(file) 

I am getting error messages such as my new_zip string does not exist (FileNotFoundError) and this is a read only action.

Upvotes: 3

Views: 9769

Answers (2)

Fronto
Fronto

Reputation: 384

here how we can solve this

import os
import boto3
from io import BytesIO, StringIO
from zipfile import ZipFile, ZIP_DEFLATED

def zipping_files(event, context):
    s3 = boto3.resource('s3')

    BUCKET = 'BUCKET NAME'
    PREFIX_1 = 'KEY NAME'
    s3_client = boto3.client('s3')
    files_to_zip = []
    response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX_1)

    all = response['Contents']     
    for i in all:
        files_to_zip.append(str(i['Key'])) 

    # we download all files to tmp directory of lambda for that we create directory structure in /tmp same as s3 files structure (subdirectory)

    for KEY in files_to_zip:
    try:
        local_file_name = '/tmp/'+KEY
        if os.path.isdir(os.path.dirname(local_file_name)):
          print(local_file_name)
        else:
          os.mkdir(os.path.dirname(local_file_name))

        s3_client.Bucket(bucket).download_file(KEY, local_file_name)
    except botocore.exceptions.ClientError as e:
        print(e.response)

    #now create empty zip file in /tmp directory use suffix .zip if you want 
    with tempfile.NamedTemporaryFile('w', suffix='.tar.gz', delete=False) as f:
      with ZipFile(f.name, 'w', compression=ZIP_DEFLATED, allowZip64=True) as zip:
        for file in files_to_zip:
          zip.write('/tmp/'+file)

  #once zipped in temp copy it to your preferred s3 location 
  s3_client.meta.client.upload_file(f.name, bucket, 'destination_s3_path ex. out/filename.tar.gz')
  print('All files zipped successfully!')

Upvotes: 2

Freek Wiekmeijer
Freek Wiekmeijer

Reputation: 4950

This code sample attempts to create a local file NEW KEY NAME on the local filesystem of the Lambda function's container, in the default directory (which is /var/task afaik).

Step 1: make a decent file path in the /tmp directory, i.e. os.path.join('/tmp', target_filename).

Step 2: your code is not uploading the zipfile to S3. add a call to s3_client.put_object.

Upvotes: 0

Related Questions