Schedule a lambda function to execute all objects in s3

Question

My requirement is ,files(1000) will be uploaded to the s3 bucket.Once it is uploaded ,s3:Put Object event will get triggered and it will run a lambda function for the object which is uploaded into s3.A transformation will occur and the transformed result is also stored in s3 in another bucket.Now I made a small change in my lambda function code.I need this change to reflect across all transformed result.for this,I need to schedule the lambda function to take the already uploaded files(1000 files) and trigger the lambda function to do transformation and again overwrite on the another bucket where I already stored my transformed result.

My question is:How to schedule the lambda function to take the already uploaded files(1000 files) and trigger the lambda function to do transformation and again overwrite on the another bucket where I already stored my transformed result?

Note:All the 1000 files will have to execute in a sequence as the transformed results of the files are being stored in the same output file.so I limited the reserved concurrency to 1

Setup:Using AWS console UI ,Programming language :Python,File size : 50 MB

Lucas Barbosa · Accepted Answer

Your workflowe / pipeline is basically this:

Flow

AWS does offer support for other ways of triggering lambda functions such as cloudwatch, a messaging system such SQS and SNS and many others.

That being said, based on the scenario that you described is not just about scheduling the lambda to run again, but is more related to how your code works, the size of the files, the number of files and if this is a "one time job only".

In this case one solution to refactor your pipeline (assuming that the file sizes are large) would be something like the image in the link here

Another lambda List the contents of the landing bucket (trigger lambda)
The trigger lambda is scheduled using a cloudwatch event rule (cron exp)
After listing contents the trigger runs each transformation lambda with the params collected from the objects listed in the source bucket (if this is a python code, check over boto3 examples to list objects).

More details on how to use lambdas to trigger another lambdas can be found in here (https://aws.amazon.com/blogs/architecture/a-serverless-solution-for-invoking-aws-lambda-at-a-sub-minute-frequency/).

If the files are very small and each lambda execution takes only small amount of time then another solution would be to create bash script to the same thing (from your machine) and use the awscli to list the objects in the landing bucket.

Then with for loop you can run the same lambda from your terminal passing the s3 file as the args to the lambda payload (something like this).

#!/bin/bash
aws lambda invoke --function-name my-function --cli-binary-format raw-in-base64-out --payload '{"key": "value"}' out

Hopefully this will help.

Schedule a lambda function to execute all objects in s3

Answers (2)

Related Questions