Increase read from s3 performance of lambda code

I am reading a large json file from s3 bucket. The lambda gets called a few hundred times in a second. When the concurrency is high, the lambdas start timing out.

Is there a more efficient way of writing the below code, where I do not have to download the file every time from S3 or reuse the content in memory across different instances of lambda :-)

The contents of the file change only once in a week!

I cannot split the file (due to the json structure) and it has to be read at once.

s3 = boto3.resource('s3')
s3_bucket_name = get_parameter('/mys3bucketkey/')
bucket = s3.Bucket(s3_bucket_name)

try:
    bucket.download_file('myfile.json', '/tmp/' + 'myfile.json')
except:
    print("File to be read is missing.")

with open(r'/tmp/' + 'myfile.json') as file:
    data = json.load(file)

Upvotes: 3

Answers (3)

JfrogT

Reputation: 975

Place the code to get the file outside the handler function and if the last function invocation is less than x return the last result.

X is whatever delay is acceptable but lat least should be 1 second so at most your calling s3 every second for however many 000s of requests per second are handled.

You can add an internal memory cache manager also. Or use aws caching on lambdas.

You could also use the Etag in S3 and get object head. Check every second but only replace the memory copy if the Etag is changed.

https://aws.amazon.com/blogs/compute/caching-data-and-configuration-settings-with-aws-lambda-extensions/

Upvotes: 0

John Rotenstein

Reputation: 269340

When the Lambda function executes, it could check for the existence of the file in /tmp/ since the container might be re-used.

If it is not there, the function can download it.
If the file is already there, then there is no need to download it. Just use it!

However, you'll have to figure out how to handle the weekly update. Perhaps a change of filename based on date? Or check the timestamp on the file to see whether a new one is needed?

Upvotes: 2

Yann

Reputation: 2532

Probably, you don't reach the request rate limit https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html but worth trying to copy the same S3 file with another prefix.

One of possible solution is to avoid querying S3 by putting the JSON file into the function code. Additionally, you may want to add it as a Lambda layer and load from /opt from your Lambda: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html In this case you can automate the function update when the s3 file is updated by adding another lambda that will be triggered by the S3 update and call https://docs.aws.amazon.com/lambda/latest/dg/API_UpdateFunctionCode.html

As a long-term solution, check Fargate https://aws.amazon.com/fargate/getting-started/ with which you can build a low latency container-based services and put the file into a container.

Upvotes: 3

Increase read from s3 performance of lambda code

Answers (3)

Related Questions