rat
rat

Reputation: 1295

Running aws-lambda function in a existing bucket with files

I'm planning to migrate existing image processing logic to AWS lambda. Lambda thumbnail generator is better than my previous code so I want to re-process all the files in an existing bucket using lamdba.

Lambda seems to be only event driven, this means that my lamdba function will only be called via a PUT event. Since the files are already in the bucket this will not trigger any events.

I've considered creating a new bucket and moving the files from my existing bucket to a new bucket. This will trigger new PUT events, but my bucket has 2MM files so I refuse to consider this hack as a viable options.

Upvotes: 4

Views: 2607

Answers (2)

R Ma
R Ma

Reputation: 171

You can add a SQS queue as an event source/trigger for the Lambda, make the slight changes in the Lambda to correctly process a SQS event as opposed to a S3 event, and then using a local script loop through a list of all objects in the S3 bucket (with pagination given the 2MM files) and add them as messages into SQS. Then when you're done, just remove the SQS event source and queue.

This doesn't get around writing a script to list and find and then call the lambda function but the script is really short. While this way does require setting up a queue, you won't be able to process the 2MM files with direct calls due to lambda concurrency limits.

Example:

  1. Set up SQS queue and add as event source to Lambda.
    • The syntax for reading a SQS message and an S3 event should be pretty similar
  2. Paginate through list_objects_v2 on the S3 bucket in a for-loop
  3. Create messages using send_message_batch

Suggestion: Depending on the throughput of new files landing in your bucket, you may want to switch to S3 -> SQS -> Lambda processing anyways instead of direct S3 -> Lambda calls. For example, if you have large bursts of traffic then you may hit your Lambda concurrency limit, or if an error occurs and you want to keep the message (can be resolved by configuring a DLQ for your lambda).

Upvotes: 1

William Gaul
William Gaul

Reputation: 3181

You don't necessarily have to use S3 as the event source even though you will be dealing with S3 files. For example, you could create a function that accepts a custom event, perhaps with the S3 bucket and image filename as keys, and then calls the AWS SDK to retrieve the actual image data for processing. You can then invoke this function from the console or command line with the bucket and filename you want to process, and you'll be good to go.

Upvotes: 1

Related Questions