Reputation: 972
I have an serverless python lambda function (lambda1) which is integrated with api gateway and s3. So when users hits GET /names
on the api gateway the lambda reads through a CSV file in the S3 and returns the response. CSV file is big and so the lambda takes significant time to response. So added a List
in python to cache the CSV file in memory to reduce the response time. Now when I upload a new CSV file I expect the API to return the new response from new CSV but due to cache it doesn't (lambda doesn't shutdown very often because of high velocity)
I also have another lambda (lambda2) which is invoked when there is a new CSV file uploaded in the same s3 bucket and processes the CSV file for audit.
So I have 3 ideas to reset the cache. But wanted to know the right or better approach to reset the cache.
Upvotes: 2
Views: 4742
Reputation: 11708
None of the 3 approaches that you listed out will work if AWS creates more than one containers for your Lambda due to traffic. Because if that happens then all your approaches will eventually trigger only one of the containers and only that container will get updated cache but the remaining ones will still be stale.
Instead what you can do is cache the lastModified
timestamp as well in your Lambda and before serving any request check the timestamp of the S3 file. If it is not same as your cached timestamp then it means a new file is uploaded and you can update the cache with the latest CSV.
With this approach your response time will go up a little bit (for the extra time to check the timestamp) but at-least you will always serve the latest content of that gets modified
Upvotes: 1