Reputation: 1295
I'm planning to migrate existing image processing logic to AWS lambda. Lambda thumbnail generator is better than my previous code so I want to re-process all the files in an existing bucket using lamdba.
Lambda seems to be only event driven, this means that my lamdba function will only be called via a PUT event. Since the files are already in the bucket this will not trigger any events.
I've considered creating a new bucket and moving the files from my existing bucket to a new bucket. This will trigger new PUT events, but my bucket has 2MM files so I refuse to consider this hack as a viable options.
Upvotes: 4
Views: 2607
Reputation: 171
You can add a SQS queue as an event source/trigger for the Lambda, make the slight changes in the Lambda to correctly process a SQS event as opposed to a S3 event, and then using a local script loop through a list of all objects in the S3 bucket (with pagination given the 2MM files) and add them as messages into SQS. Then when you're done, just remove the SQS event source and queue.
This doesn't get around writing a script to list and find and then call the lambda function but the script is really short. While this way does require setting up a queue, you won't be able to process the 2MM files with direct calls due to lambda concurrency limits.
Example:
list_objects_v2
on the S3 bucket in a for-loopsend_message_batch
Suggestion: Depending on the throughput of new files landing in your bucket, you may want to switch to S3 -> SQS -> Lambda processing anyways instead of direct S3 -> Lambda calls. For example, if you have large bursts of traffic then you may hit your Lambda concurrency limit, or if an error occurs and you want to keep the message (can be resolved by configuring a DLQ for your lambda).
Upvotes: 1
Reputation: 3181
You don't necessarily have to use S3 as the event source even though you will be dealing with S3 files. For example, you could create a function that accepts a custom event, perhaps with the S3 bucket and image filename as keys, and then calls the AWS SDK to retrieve the actual image data for processing. You can then invoke this function from the console or command line with the bucket and filename you want to process, and you'll be good to go.
Upvotes: 1