how to download S3 file in Serverless Lambda (Python)

Question

I created a lambda in Python (using Serverless), which will be triggered by a SQS message.

handler.py

s3 = boto3.resource('s3')

def process(event, context):
    response = None
    # for record in event['Records']:
    record = event['Records'][0]
    message = dict()
    try:
        message = json.loads(record['body'])

        s3.meta.client.download_file(const.bucket_name, 'class/raw/photo/' + message['photo_name'], const.raw_filepath + message['photo_name'])    

        ...

        response = {
            "statusCode": 200,
            "body": json.dumps(event)
        }

    except Exception as ex:
        error_msg = 'JOB_MSG: {}, EXCEPTION: {}'.format(message, ex)
        logging.error(error_msg)

        response = {
                "statusCode": 500,
                "body": json.dumps(ex)
            }

    return response

const.py

bucket_name = 'test'
raw_filepath = '/var/task/raw/'

I created a folder "raw", same level with the file handler.py then deploy the serverless lambda.

I got an error (from CloudWatch) when lambda is triggered.

No such file or directory: u'/var/task/raw/Student001.JPG.94BBBAce'

As I understand, the lambda folder is not accessible or folder cannot be created in lambda.

Just in case of best practices, I share the objectives of lambda:

download S3 raw file
resize file and upload new file to another S3 bucket

Any suggestion is appreciated.

Milan Cermak · Accepted Answer

If you need to download the object to the disk, you can use tempfile and download_fileobj to save it:

import tempfile

with tempfile.TemporaryFile() as f:
    s3.meta.client.download_fileobj(const.bucket_name, 
                                   'class/raw/photo/' + message['photo_name'],
                                    f)
    f.seek(0)
    # continue processing f

Note that there's a 512 MB limit on the size of temporary files in Lambda.

I would argue an even better way is to process it all in memory. Instead of tempfile, you can use io in a very similar fashion:

import io

data_stream = io.BytesIO()
s3.meta.client.download_fileobj(const.bucket_name, 
                               'class/raw/photo/' + message['photo_name'],
                                data_stream)
data_stream.seek(0)

This way, the data does not need to be written to a disk, which is a) faster and b) you can process bigger files, basically until you reach Lambda's memory limit of 3008 MB or memory.

how to download S3 file in Serverless Lambda (Python)

Answers (2)

Related Questions