Reputation: 377
I am triggering an AWS lambda from an EC2 instance multiple times in a loop passing subset of a 350MB dataset to Lambda which manipulates each data set passed to it. The Lambda writes the output to a Kinesis Firehose stream which then writes it to an S3 Bucket. Buffer Size is 50MB and 350 seconds is S3 buffer interval for the Kinesis Firehose stream. So I get around 7 files of 50 MB each after 6-7 mins.
I want to trigger a Lambda which combines all the files in S3 which has data in JSON and creates a CSV file out of it after Kinesis Firehose stream is done writing all files to S3.
The challenge is how do I know that all the Lambda's are done with their operations and Kinesis Firehose buffer is empty as it has written all files to S3, so that I can trigger this Lambda which creates the CSV file from all the JSON files in S3.
One option is that I after the loop I wait for 350 seconds and then trigger the CSV creation lambda after the last lambda has been called.
Is there a way to trigger lambda after all the Kinesis Firehose stream data is written rather than use a timer.
Upvotes: 0
Views: 845
Reputation: 342
i am not sure about your use case like why are you using Firehose, But if u want to go with it , then it can work with below conditions
in this way u will get whole one chuck of file of size 350 MB and then u can trigger lambda which will convert it to JSON.
Anyways you are waiting 6-7 minutes to get 350 MB Data got transferred so its same thing performance wise to make it 350 MB buffer size and 7 minute time
Upvotes: 0
Reputation: 2824
You design has some flaws IMO:
Where you stand now you can control how to invoke lambdas (async vs sync), you can have a S3 trigger, but you can't know when kinesis/firehose is done. You will have to change your code/design to really not find your self in a nightmare. You can't just wait X number of seconds on kinesis/firehose, there are many reasons to have a delay in the records consumption that will break your design.
Either:
Upvotes: 0