aash
aash

Reputation: 1323

Downloading files from s3 in Lambda

I am trying to download files from s3 in AWS lambda.

There is a web service that pushes s3 metadata (key, bucket) to SQS. I have a lambda that downloads the file and pushes its contents to elasticsearch. Here is my code:

import config

def push_data(event, context):
    try:
        _push_data(event, context)
    except Exception as e:
        print("Exception raised %s" % e)


def _push_data(event, context):
    files_data = get_files_data(event)
    for file_data in files_data:
        is_success, data = push_file(
            index=file_data["index"], file_bucket=file_data["file_bucket"],
            file_key=file_data["file_key"]
        )
        if is_success:
            call_post_push(file_data[0], data)


def push_file(index, file_bucket, file_key):
    start_time = datetime.datetime.now()
    print("I have started downloading %s" % start_time)

    file_path = '/tmp/a.xlsx'
    # download file from s3
    client = boto3.client(
        's3',
        aws_access_key_id=config.AWS_ACCESS_KEY_ID,
        aws_secret_access_key=config.AWS_SECRET_ACCESS_KEY,
    )
    client.download_file(Bucket=file_bucket, Key=file_key, Filename=file_path)
    #

    # contains code to push file contents to s3
    print("Finished")

When the lambda executes, it times out after printing I have started downloading ...

  1. Lambda is inside a vpc which has a NAT Gateway configured.
  2. Lambda has permission to access s3.
  3. The s3 bucket from which I am downloading the file is in different region than the lambda. However I don't think this should cause any issue.
  4. I increased the timeout to 5 mins. for the function so that in case the file is huge, I don't get any issue
  5. I first uploaded a small file to ensure that the download time is not huge.
  6. I ran the same code on my local machine to see if there are no issues in download. It turns out that it takes not more than 1 sec to download the file that I am testing using Lambda.

Doing all the things above didn't help. Please let me know how to solve this problem or if there is any other thing that I need to check.

Upvotes: 4

Views: 9263

Answers (1)

Joseph
Joseph

Reputation: 542

The issue is the lambda function cannot reach the pubic internet, and this cannot reach the S3 API endpoint. Most likely the Nat Gateway does not reside in a public subnet, meaning the nat gateway is not in a subnet with an internet gateway as the default route.

To fix this, built a nat gateway in a public subnet, and use this as the default route for the lambda function or add a VPC Endpoint for S3. Use this VPCE as the next route in the route table for the subnet containing the lambda function.

https://docs.aws.amazon.com/vpc/latest/userguide/vpce-gateway.html

Upvotes: 3

Related Questions