Reputation: 1323
I am trying to download files from s3 in AWS lambda.
There is a web service that pushes s3 metadata (key, bucket) to SQS. I have a lambda that downloads the file and pushes its contents to elasticsearch. Here is my code:
import config
def push_data(event, context):
try:
_push_data(event, context)
except Exception as e:
print("Exception raised %s" % e)
def _push_data(event, context):
files_data = get_files_data(event)
for file_data in files_data:
is_success, data = push_file(
index=file_data["index"], file_bucket=file_data["file_bucket"],
file_key=file_data["file_key"]
)
if is_success:
call_post_push(file_data[0], data)
def push_file(index, file_bucket, file_key):
start_time = datetime.datetime.now()
print("I have started downloading %s" % start_time)
file_path = '/tmp/a.xlsx'
# download file from s3
client = boto3.client(
's3',
aws_access_key_id=config.AWS_ACCESS_KEY_ID,
aws_secret_access_key=config.AWS_SECRET_ACCESS_KEY,
)
client.download_file(Bucket=file_bucket, Key=file_key, Filename=file_path)
#
# contains code to push file contents to s3
print("Finished")
When the lambda executes, it times out after printing I have started downloading ..
.
Doing all the things above didn't help. Please let me know how to solve this problem or if there is any other thing that I need to check.
Upvotes: 4
Views: 9263
Reputation: 542
The issue is the lambda function cannot reach the pubic internet, and this cannot reach the S3 API endpoint. Most likely the Nat Gateway does not reside in a public subnet, meaning the nat gateway is not in a subnet with an internet gateway as the default route.
To fix this, built a nat gateway in a public subnet, and use this as the default route for the lambda function or add a VPC Endpoint for S3. Use this VPCE as the next route in the route table for the subnet containing the lambda function.
https://docs.aws.amazon.com/vpc/latest/userguide/vpce-gateway.html
Upvotes: 3