Reputation: 5411
I have Django running on AWS Lambda connecting to MySQL on RDS. Everything works great most of the time.
However, if a spike executes 10000 concurrent requests this spawns many Lambda containers and each opens a database connection which will eventually exceed RDS connection limits. (as per https://serverfault.com/questions/862387/aws-rds-connection-limits)
What is the best strategy (if any) to get this to web-ish scale without losing SQL. Some ideas:
Upvotes: 2
Views: 3455
Reputation: 822
You can get around the disproportion of concurrent lambda executions to number of MySQL RDS connections by using a RDS proxy starting from December 2019.
What RDS proxy actually does is connection pooling (which Lambda cannot do on it own), note however that there is additional cost associated with the RDS proxy.
Ref: https://aws.amazon.com/blogs/compute/using-amazon-rds-proxy-with-aws-lambda/
Upvotes: 4
Reputation: 562230
After the AWS conference in November 2017, http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html says:
You can optionally set the concurrent execution limit for a function.
Visit the page for details.
I posted the answer below before the per-function concurrency controls were announced:
You could rely on the AWS max concurrent Lambda executions, then if you have a max connection limit on your RDS that is at least 1000, then it won't be a problem. In other words, AWS will throttle your Lambda concurrency before MySQL on RDS limits you. Depending on the instance size, 1000 is not an unusual value for max connections on MySQL.
If you use the default max connections on RDS, you have to use at least a db.m3.xlarge or db.r3.large (see value of max_connections in AWS RDS). But you can change the max connections in a parameter group (thanks to comment from @Michael-sqlbot for this info).
I have another idea if you want to throttle your own concurrent executions for your specific Django app. Note: I have not tried this, I just thought of it.
Suppose you want no more than 75 concurrent executions.
Thus if you have 75 Lambda functions running, the 76th request will have to wait until one of the Lambda functions restores a value back into the queue.
This architecture allows you to control your scale to any max concurrency you want, up to your max of 1000 per region imposed by AWS. Just prime the SQS queue with a different number of items. You can even increase or decrease the throttling without redeploying your Lambda code, just by adding a few or deleting a few items in the queue.
You can also allow different Lambda functions to have their own "pool" because for example you might have different Django apps connecting to their own RDS instances. Just create a new SQS queue per "pool" and make sure different Lambda functions are using the right SQS queue.
I suppose there's a risk that if any Lambda aborts before it restores its value to the SQS queue, you'd see the queue deplete gradually and eventually become empty. Ideally, the length of the queue would shrink and grow cyclically. But if the length plummets toward zero, it could indicate your Lambda code is crashing. You might want to set up some kind of alert for this with CloudWatch.
Upvotes: 4
Reputation: 200466
Is there a way to limit the number of AWS containers to encourage/force re-use rather than spawning more?
No
Can API Gateway detect a spike and delay putting the connections through to Lambda?
No
Liberal use of MySQL/RDS read-replicas
That's your best option without moving away from relational databases entirely. Or move your Django app back to a more traditional EC2 web server and use database connection pools.
Upvotes: 3