Reputation: 11
I'm looking into the best approach for having a stream from our Dynamo tables and there is some conflicting answers/documentation over whether using the Dynamo Stream guarantees no duplicates.
The main streams doc page has a table that says for Dynamo Streams:
No duplicate records appear in the stream.
However when you get to the Streams with Lambda Best Practices doc page it says:
A Lambda consumer for a DynamoDB stream doesn't guarantee exactly once delivery and may lead to occasional duplicates. Make sure your Lambda function code is idempotent to prevent unexpected issues from arising because of duplicate processing.
The other implementation for DynamoDB streams is using a Kineses Adapter, but that doc page doesn't make any mention of duplication at all.
Is there some sort of duplication happening between the stream and the lambda triggering, or are one of these pages just outdated?
Upvotes: 0
Views: 61
Reputation: 19793
This blog post should answer your question in detail: https://aws.amazon.com/blogs/database/build-scalable-event-driven-architectures-with-amazon-dynamodb-and-aws-lambda/
In short, DynamoDB provides exactly once delivery of events to the stream, however nothing prevents Lambda from processing the same batch more than once.
To handle duplicate events, you can use Powertools for Lambda which provides idempotency protection.
https://docs.powertools.aws.dev/lambda/python/latest/utilities/idempotency/
Upvotes: 0
Reputation: 7132
There’s a guarantee the stream contains exactly one record of each mutation, as the docs say.
There’s no guarantee a lambda function will be invoked exactly once for the records. For example, the first invocation might not complete successfully and the lambda function will be retried.
Upvotes: 0