Reputation: 3051
We use Lambda to power APIs (via API Gateway) accessed via news media websites, receiving a fluctuating but high load of traffic. We began experiencing throttles, so we raised our concurrency limit to 2000. However, we still experience throttles multiple times per day.
Oddly in CloudWatch metrics, the concurrent requests peak at around 600 or lower when we're throttled. See this CloudWatch chart as an example:
Has anyone experienced this before? Why do you think this is happening? What can we do about it?
More Information
Additionally, here's an image that also shows total invocation count and average duration over the same time period. It's hard to know what's causal (duration up because of throttling, or vice versa, because some of the lambdas do call other lambdas). Please see the appropriate axis because the scales are quite different.
Upvotes: 12
Views: 4692
Reputation: 46
I think this has to do with Lambda concurrency burst limits.
Basically, there's a limit on how many instances of your Lambda function you can run concurrently under sudden load and this limit is different to the overall per-region Lambda concurrency limit.
You can find more information about it here:
https://docs.aws.amazon.com/lambda/latest/dg/scaling.html
The relevant part:
AWS Lambda dynamically scales function execution in response to increased traffic, up to your concurrency limit. Under sustained load, your function's concurrency bursts to an initial level between 500 and 3000 concurrent executions that varies per region. After the initial burst, the function's capacity increases by an additional 500 concurrent executions each minute until either the load is accommodated, or the total concurrency of all functions in the region hits the limit.
Upvotes: 3
Reputation: 3950
This seems very familiar. We had the exact same issue and we were baffled because our concurrency limit had been increased but unfortunately that's not the magic fix for infinite scalability of serverless apps.
My guess is that you're running out of ENI's (Elastic Network Interfaces) as each lambda function requires one before it's initialized. The default limit for this is 350 concurrently attached ENI's.
Your 600 concurrent lambas are grouped per minute so I imagine a couple of them overlap on a minute, hence more than 350.
To investigate this, go into the global settings for your API Gateway and provide it with an IAM role arn that has access to putlogs to CloudWatch. Then go into the individual API Gateway api and enable verbose logging.
Any errors that occur when API Gateway is trying to invoke a lambda function should show up here rather than be muffled (by default).
If the error looks somewhat like :
{
"Message": "Lambda was not able to create an ENI in the VPC of the Lambda function because the limit for Network Interfaces has been reached.",
"Type": "User"
}
If that's the case you'll need to request a limit increase on ENI's.
Upvotes: 8