Reputation: 4387
Currently, I'm implementing a solution based on S3, Lambda and DynamoDB. My use case is, when a new object is uploaded on S3, a first Lambda function is called, downloads the new file, splits it in around 100(or more) parts and for each of them, adds additional information. Next step, each part will be processed by second Lambda function and in some case an insert will be performed in DynamoDB.
My question is only about the best way to call the "second lambda". I mean, the faster way. I want to execute 100 Lambda function(if I'd 100 parts to process) at the same time.
I know there are different possibilities:
1) My first Lambda function can push each part as an item in a Kinesis stream and my second Lambda function will react, retrieve an item and processed it. In this case I don't know if AWS will launch a new Lambda function each time there is a remaining item in the stream. Maybe there is some limitation...
2) My first Lambda function can push each part in an SNS topic and then my second Lambda will react to each new message. In this case I've some doubts about the latency(time between the action to send a message through the SNS topic and the time to my second Lambda function to be executed).
3) My first Lambda function can launch directly the second one by performing an API call and by passing the information. In this case I have no idea if I can launch 100 Lambdas function at the same time. I think I'll be stuck by a rate limitation against the AWS API(I said, I think!)
Somebody have a feedback and maybe advises regarding my use case? One more time, the most important for me it's to have the faster process way.
Thanks
Upvotes: 2
Views: 964
Reputation: 8402
Lambda limits are in place to provide some sane defaults but many workloads quickly exceed them. You can request an increase so this will not be a bottleneck for your use case. This document describes the process: http://docs.aws.amazon.com/lambda/latest/dg/limits.html
I'm not sure how much latency your use case can tolerate but I often use SNS to fan out and the latency is usually sub-second to the next invocation (unless it's Java/coldstart).
If latency is extremely sensitive then you'd probably want to invoke Lambdas directly using Invoke
with the InvocationType
set to "Event". This would minimize blocking while you Invoke
100 times. You could also thread these Invoke
calls within your main Lambda function to further increase parallelism if you want to hyper-optimize.
Cold containers will occasionally cause latency in your invocations. If milliseconds count this can become tricky. People who are trying to hyper-optimize Lambda processing times will sometimes schedule executions of their Lambda function with a "heartbeat" event that returns immediately (so processing time is cheap). These containers will remain "warm" for a small period of time which allows them to pick up your events without incurring "cold startup" time. Java containers are much slower to spin up cold than Node containers (I assume Python is probably equally fast as Node though I haven't tested).
Upvotes: 4