Reputation: 4131
As per my understanding, AWS Lambda runs the uploaded code on an EC2 instance that is not accessible to the user. It creates the runtime environment required to run the uploaded code, manages the permissions and balances load. This is what I think AWS Lambda does behind the scenes.
Quoting Wikipedia:
Infrequently-used serverless code may suffer from greater response latency than code that is continuously running on a dedicated server, virtual machine, or container. This is because, unlike with an autoscaling, the cloud provider typically "spins down" the serverless code completely when not in use.
This makes sense, but AWS Lambda does claim to use autoscaling:
AWS Lambda automatically scales your application by running code in response to each trigger. Your code runs in parallel and processes each trigger individually, scaling precisely with the size of the workload.
My questions are -
Upvotes: 2
Views: 5476
Reputation: 1424
Straight forward way: Run the function and see the CloudWatch logs.
- If your code is very CPU intensive: allocate more RAM
Example: lots of mathematical calculations, crypto libs, etc.
- If your code is not CPU intensive: less RAM
Example: consumes remote APIs, simple object transformations.
RAM as Cache
You don't have to worry about how instances are created and destroyed, you can make the assumption that your code will just run every time it is needed and all requests will be attended, only consider that sometimes it will be a newly created instance, and other times will be an existing instance (RAM is not flushed between invocations of the same instance), so you can reuse Database connections and use RAM as cache for frequently used objects inside the same Instance.
Each AWS Lambda Instance can only process one request at a time, so every time your service receives a request and the existing Lambda instances are busy (or there are none running), it deploys a new one automatically, up to a default number of 1000 lambda instances.
You define the max amount of time your function can take to respond, to limit costs per request.
As soon as you return a response to any of the existing running instances, it becomes available to receive more requests.
Existing instances are destroyed after ~15 minutes, if no request are received on those instances.
All running instances are recycled every ~4 hours.
The process inside the instance is frozen between invocations, including all child processes and async callbacks.
The initialization code (outside the handler function) is executed only on instance startup and frozen afterwards in-between invocations.
Important limitation:
Lambda functions that respond to HTTP request, are limited to max 30 seconds to return a response.
Each Instance has its own separated memory, you define the amount of RAM allocated for your lambda to run on the Lambda console config. More RAM equals more CPU but also more cost per second.
You are only charged for the number of seconds that your code takes from the moment it receives the request to the moment it returns
After you run the function you can use CloudWatch to see the exact milliseconds and amount of RAM in Mb that the function really used and adjust accordingly.
Sometimes too much RAM can be seen as a waste, but it also means more CPU allocation making your function to finish in less time and resulting in a lower cost.
Upvotes: 3
Reputation: 46849
Lambda doesn't know the amount of memory and CPU it needs - you tell it and are billed accordingly when you setup the function (and after it is setup you can change it if you want to).
Lambda does not run on a single EC2 instance, it is generally understood that Lambda functions run in docker containers (which run on EC2 instances under the covers) - or more likely they use AWS's EC2 Container service to do the orchestration of all of these lambda's.
It 'scales' by increasing the number of instances running, not the size of the lambda running - so if you flood your lambda function with hundreds of calls at once, it doesn't increase the memory or cpu for your lambda, it spins additional instances to handle the load.
There is a delay when lambda needs to spin up a new instance - especially if you haven't run it in a while - often referred to as a cold-start - once the requests keep coming in, lambda tends to stay ready to service the next request, so subsequent calls run much faster than the first 1-2. Once the calls stop coming in - AWS may spin-down the instance, but there is no documentation about when or why this might happen. My experience has been that as long as there is a steady stream of requests, latency is remarkably low - and when you do a cold-start, it will incur a 'penatly' to get it going again.
If you need to reduce the 'cold start' delay, the easiest way is to specify a larger memory size - the memory and CPU scale in tandem, so even if your function doesn't need more memory, giving it more memory will reduce the initial latency.
Upvotes: 10