Phong Vu
Phong Vu

Reputation: 2896

how to download S3 file in Serverless Lambda (Python)

I created a lambda in Python (using Serverless), which will be triggered by a SQS message.

handler.py

s3 = boto3.resource('s3')

def process(event, context):
    response = None
    # for record in event['Records']:
    record = event['Records'][0]
    message = dict()
    try:
        message = json.loads(record['body'])

        s3.meta.client.download_file(const.bucket_name, 'class/raw/photo/' + message['photo_name'], const.raw_filepath + message['photo_name'])    

        ...

        response = {
            "statusCode": 200,
            "body": json.dumps(event)
        }

    except Exception as ex:
        error_msg = 'JOB_MSG: {}, EXCEPTION: {}'.format(message, ex)
        logging.error(error_msg)

        response = {
                "statusCode": 500,
                "body": json.dumps(ex)
            }

    return response

const.py

bucket_name = 'test'
raw_filepath = '/var/task/raw/'

I created a folder "raw", same level with the file handler.py then deploy the serverless lambda.

I got an error (from CloudWatch) when lambda is triggered.

No such file or directory: u'/var/task/raw/Student001.JPG.94BBBAce'

As I understand, the lambda folder is not accessible or folder cannot be created in lambda.

Just in case of best practices, I share the objectives of lambda:

Any suggestion is appreciated.

Upvotes: 11

Views: 33676

Answers (2)

adjr2
adjr2

Reputation: 53

In one of my project I converted webp files to jpg. I can refer to the following github link to get some understanding:

https://github.com/adjr2/webp-to-jpg/blob/master/codes.py

You can directly access the file you download in lambda function. I am not sure whether you can create a new folder or not (even I am pretty new to all this stuff) but surely you can manipulate the file and upload back to the same (or different) s3 bucket.

Hope it helps. Cheers!

Upvotes: 1

Milan Cermak
Milan Cermak

Reputation: 8064

If you need to download the object to the disk, you can use tempfile and download_fileobj to save it:

import tempfile

with tempfile.TemporaryFile() as f:
    s3.meta.client.download_fileobj(const.bucket_name, 
                                   'class/raw/photo/' + message['photo_name'],
                                    f)
    f.seek(0)
    # continue processing f

Note that there's a 512 MB limit on the size of temporary files in Lambda.

I would argue an even better way is to process it all in memory. Instead of tempfile, you can use io in a very similar fashion:

import io

data_stream = io.BytesIO()
s3.meta.client.download_fileobj(const.bucket_name, 
                               'class/raw/photo/' + message['photo_name'],
                                data_stream)
data_stream.seek(0)

This way, the data does not need to be written to a disk, which is a) faster and b) you can process bigger files, basically until you reach Lambda's memory limit of 3008 MB or memory.

Upvotes: 16

Related Questions