Reputation: 21
We have configured DynamoDB streams to trigger a Lambda function. More than 10 million unique records will be inserted into DynamoDB table within 30 minutes and Lambda will process these records when triggered through streams.
As per DynamoDB Streams documentation, streams will expire after 24 hrs.
Question:
Does this mean that Lambda function (multiple concurrent executions) should complete processing of all 10 million records within 24hrs?
If some streams events remain to be processed after 24hrs, will they be lost?
Upvotes: 2
Views: 2018
Reputation: 2400
As long as you don't throttle the lambda, it won't 'not keep up'.
What will happen is the stream will be batched depending on your settings - so if you have your settings in your dynamo stream to 5 events at once, it will bundle five events and push them toward lambda.
even if that happens hundreds of times a minute, Lambda will (assuming again you aren't purposely limiting lambda executions) spin up additional concurrent executions to handle the load.
This is standard AWS philosophy. Pretty much every serverless resource (and even some not, like EC2 with Elastics Beanstalk) are designed to seamlessly and effortless scale horizontally to handle burst traffic.
Likely your Lambda executions will be done within a couple of minutes of the last event being sent. The '24 hour time out' is against waiting for a lambda to be finished/reactivated (ie: you can set up cloudwatch events to 'hold' Dynamo Streams until certain times of the day then process everything, such as waiting until off hours to let all the streams process, then turning it off again during business hours the next day)
To give you an example that is similar - I ran 10,000 executions through an SQS into a lambda. It completed the 10,000 executions in about 15 mins. Lambda concurrency is designed to handle this kind of burst flow.
Your Dynamo Read/Write capacity is going to be hammered however, so make sure you have it set to at least dynamic and not provisioned.
UPDATE
As @Maurice pointed out in the comments, there is a Stream Limit on concurrent batches sent at a moment with Dynamo. The calculation indicates that it will fall far short even with a short lambda execution time - longer the lambda, the less likely you are to complete.
Which means, if you don't have to have those all processed as quickly as posisble you should divide up the input.
You can add an AWS SQS queue somewhere in the process. Most likely, because even with the largest batch size and and a super quick process you wont get through all them, before the insert into dynamo.
The SQS has limits on its messages of up to 14 days. This may be enough to do what you want. If you have control of the messages coming in you can insert them into an sqs queue with a wait attached to it in order to process a smaller amount inserts at once - what can be accomplished in a single day, or well slightly less. It would be
lambda to collate your inserts into an SQS queue -> SQS with a wait/smaller batch size -> Lambda to insert smaller batches into dynamo -> Dynamo Stream -> Processing Lambda
The other option is to do something similar but use a State Machine with wait times and maps. State Machines have a 1 year run time limit, so you have plenty of time with that one.
The final option is to, instead of streaming the data straight into lambda, execute lambdas to query smaller sections of the dynamo at once to process them
Upvotes: 1