Reputation: 13
My requirement is ,files(1000) will be uploaded to the s3 bucket.Once it is uploaded ,s3:Put Object event will get triggered and it will run a lambda function for the object which is uploaded into s3.A transformation will occur and the transformed result is also stored in s3 in another bucket.Now I made a small change in my lambda function code.I need this change to reflect across all transformed result.for this,I need to schedule the lambda function to take the already uploaded files(1000 files) and trigger the lambda function to do transformation and again overwrite on the another bucket where I already stored my transformed result.
My question is:How to schedule the lambda function to take the already uploaded files(1000 files) and trigger the lambda function to do transformation and again overwrite on the another bucket where I already stored my transformed result?
Note:All the 1000 files will have to execute in a sequence as the transformed results of the files are being stored in the same output file.so I limited the reserved concurrency to 1
Setup:Using AWS console UI ,Programming language :Python,File size : 50 MB
Upvotes: 1
Views: 1510
Reputation: 269470
You can just copy the objects over themselves, which will cause the AWS Lambda function to run again.
When copying the objects over themselves, you need to make something different otherwise you get this error:
copy failed: An error occurred (InvalidRequest) when calling the CopyObject operation: This copy request is illegal because it is trying to copy an object to itself without changing the object's metadata, storage class, website redirect location or encryption attributes.
Therefore, you can add some metadata when performing the copy:
aws s3 cp --recursive s3://bucket/folder/ s3://bucket/folder/ --metadata ignore=ignore
Try it on one file first (without the `--recursive) to confirm that it does what you want, then perform the recursive copy.
Upvotes: 4
Reputation: 138
Your workflowe / pipeline is basically this:
AWS does offer support for other ways of triggering lambda functions such as cloudwatch, a messaging system such SQS and SNS and many others.
That being said, based on the scenario that you described is not just about scheduling the lambda to run again, but is more related to how your code works, the size of the files, the number of files and if this is a "one time job only".
In this case one solution to refactor your pipeline (assuming that the file sizes are large) would be something like the image in the link here
More details on how to use lambdas to trigger another lambdas can be found in here (https://aws.amazon.com/blogs/architecture/a-serverless-solution-for-invoking-aws-lambda-at-a-sub-minute-frequency/).
If the files are very small and each lambda execution takes only small amount of time then another solution would be to create bash script to the same thing (from your machine) and use the awscli to list the objects in the landing bucket.
Then with for loop you can run the same lambda from your terminal passing the s3 file as the args to the lambda payload (something like this).
#!/bin/bash
aws lambda invoke --function-name my-function --cli-binary-format raw-in-base64-out --payload '{"key": "value"}' out
Hopefully this will help.
Upvotes: 0