Reputation: 75
The use case is that 1000s of very small-sized files are uploaded to s3 every minute and all the incoming objects are to be processed and stored in a separate bucket using lambda. But using s3-object-create as a trigger will make many lambda invocations and concurrency needs to be taken care of. I am trying to batch process the newly created objects for every 5-10 minutes. S3 provides batch operations but it reports are generated everyday/week. Is there a service available that can help me?
Upvotes: 1
Views: 3665
Reputation: 1151
According to AWS documentation, S3 can publish "New object created events" to following destinations:
In your case I would:
Currently, the maximum batch size for SQS-Lambda subscription is 1000 events. But since your Lambda needs around 2 seconds to process single event, then you should start with something smaller, otherwise Lambda will timeout, because it won't be able to process all of the events.
Thanks to this, uploading X items to S3 will produce X / Y
events, where Y is maximum batch size of SQS. For 1000 S3 items and batch size of 100, it will only invoke around 10 concurrent Lambda executions.
The AWS document mentioned above explains, how to publish S3 events to SQS. I won't explain it here, as it's more about implementation details.
However you might run into a problem, where the processing is too slow, because Lambda will be processing probably events one-by-one in a loop.
The workaround would be to use asynchronous processing and implementation depends what runtime you use for Lambda, for Node.js it would be very easy to achieve.
Also if you want to speed up the processing in other ways, simply reduce maximum batch size and increase Lambda memory configuration, so single execution will be processing smaller number of events and will have access to more CPU units.
Upvotes: 1