chaos
chaos

Reputation: 681

S3 notification creates multiple events

We have been using AWS S3 notifications to trigger lambda functions when files land on S3 and this model has worked reasonably well until we noticed that some files are processed multiple times, generating duplicates in our datastore. We noticed that it happened for about 0.05% of our files.

I know can guard against this by performing an upsert, but what is of concern to us is the potential cost of running unnecessary lambda functions, as this impacts our cost.

I've searched Google and SO, but only found similar-ish issues. We are not having a timeout problem, as the files have been processed fully. Our files are rather small, with the biggest file being less than 400k. We are not receiving the same event twice, as the events have different request ids, even though they are running on the same file.

Upvotes: 8

Views: 10067

Answers (4)

Igor S
Igor S

Reputation: 1

We resolved that issue limiting Lambda Function concurrency to 1

Upvotes: 0

Nitu Parimi
Nitu Parimi

Reputation: 26

If the sequence key doesn’t match between the events then the export process is uploading the same object multiple times and triggering the event notification with different sequence key. In this case, the events are not considered as duplicate events and invokes the Lambda function whenever the object is uploaded. This is expected behavior.

If the sequence key does match between the events, then the export process is uploading the object once however Amazon S3 generates duplicate events and maps the events with same sequence key resulting in multiple Lambda invocation. This is rare condition which happens due to retry nature of Amazon S3 service and the workaround is to store and compare the sequencer key values to check for duplicates as each event notification is processed.

Upvotes: 0

Nitu Parimi
Nitu Parimi

Reputation: 26

If sequence number is same for duplicate events: As a workaround, you can consider to trigger notification to secondary database or maintain index of S3 objects using event notifications. Then, store and compare the sequencer key values to check for duplicates as each event notification is processed. I did additional research on how you can compare unique values from the event notification in Lambda function and found article[1] which might be helpful to achieve this. Additionally, please also have a look at external article[2], [3] for sample codes for reference and ensure to test this in your development environment before implementing in production.

References:

[1] https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-idempotent/

[2] https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/

[3] https://adrianhesketh.com/2020/11/27/idempotency-and-once-only-processing-in-lambda-part-1

Upvotes: 1

chaos
chaos

Reputation: 681

After wasting quite some time looking into S3, SNS and Lambda documentations, I've found a note on specific to S3 notification that reads:

If your application requires particular semantics (for example, ensuring that no events are missed, or that operations run only once), we recommend that you account for missed and duplicate events when designing your application.

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

Effectively this means that S3 notifications is the wrong solution for us, but considering the research time I've invested in this issue, I thought I'd contribute this here for anyone else who may have overlooked the page linked above.

Upvotes: 17

Related Questions