Salman Hasrat Khan
Salman Hasrat Khan

Reputation: 2006

Caching and invalidating AWS Lambda response

I am trying to implement a solution on AWS which is as follows:

I have a crawler that will run once a day to index certain sites. I want to cache this data and expose it the the form of an API since after crawling, this data will not change for an entire day. After the crawler refetches, I want to invalidate and rebuild this cache to serve the updated data. I'm trying to use serverless architecture to build this.

Possible Solutions

It is clear that the crawler will run on AWS Lambda. What is unclear to me is how to manage the cache that will serve the data. Here are some solutions I thought of

  1. S3 and Cloudfront for caching: After crawling, store the data in the form of .json files in S3 that will be cached using AWS Cloudfront. When the crawler refetches new data, it will rebuild these files and ask Cloudfront to invalidate the cache.

  2. API Gateway DynamoDB: After Crawling store the data in DynamoDB which will be then served by API Gateway which is cached. The only problem here is how can I ask for this cache to be invalidated at the end of the day when the crawler re-crawls? Since the data will be static for a day, how can I not pay for the extra time that DynamoDB will be running (because if I implement caching on API Gateway, there will only one call to DynamoDB for caching after that it will be sitting idle for a day)

Is there any other way that I am missing?

Thanks!

Upvotes: 0

Views: 1727

Answers (1)

Ivan Mushketyk
Ivan Mushketyk

Reputation: 8295

You can store new data in different path in S3 that would include the date of creation. Maybe something like:

index_2017_08_11.json

Then there is no need to invalidate caches on the CloudFront side. Since to access these new objects you need to provide new URLs, old CloudFront cache won't be an issue. You can remove S3 files for a previous day using S3 TTL feature.

Another option is to set the Expires caching HTTP header to set when the data in cache should be invalidated:

The Expires header field lets you specify an expiration date and time using the format specified in RFC 2616, Hypertext Transfer Protocol -- HTTP/1.1 Section 3.3.1, Full Date, for example: Sat, 27 Jun 2015 23:59:59 GMT

You can set this header in API Gateway to specify when an object should be invalidated.

Since the data will be static for a day, how can I not pay for the extra time that DynamoDB will be running

If data is static, can you store it in S3 and use API Gateway to serve data from S3 instead of DynamoDB?

Upvotes: 1

Related Questions