Ronan Moriarty
Ronan Moriarty

Reputation: 616

Provisioned concurrency not resolving cold start

I have a lambda function that will be called infrequently in Production, but it will be public-facing, so I want to avoid cold-starts. So I thought I could use provisioned concurrency to avoid this issue. My Cloudformation template looks as follows:

QuoteLinkServiceFunction:
    Type: AWS::Serverless::Function
    Properties:
      # other lambda properties...
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 1

When I create this stack in my Test environment though (where I am the only user, and so there are no other calls happening concurrently), I still experience cold starts when returning to use this function after a few hours. Subsequent calls immediately after the first call run faster as the lambda is now warmed up.

The lambda console shows that the alias for this function has actually been set up with a provisioned concurrency of 1, and I have verified the ALB target group is pointed at the alias. So why am I still getting cold starts?

Upvotes: 27

Views: 14517

Answers (6)

takasoft
takasoft

Reputation: 1278

For me, it was because of not specifying the lambda version/alias. Check the lambda metrics to ensure that your provisioned concurrency environment is actually used. You need to specify the lambda version/alias where the provisioned concurrency is enabled.

Call your lambda like YourLambdaName:1, instead of just YourLambdaName.

https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html#configuring-provisioned-concurrency

You cannot use provisioned concurrency with the $LATEST version of any function.

If your function has an event source, make sure that event source points to the correct function alias or version. Otherwise, your function won't use provisioned concurrency environments.

Upvotes: 0

joshualang
joshualang

Reputation: 1

I also ran into this problem until I noticed that I'm hitting the $Latest version of my Lambda and not the explicit version that has provisioned concurrency. See the provisioned concurrency docs

You cannot use provisioned concurrency with the $LATEST version of any function. If your function has an event source, make sure that event source points to the correct function alias or version. Otherwise, your function won't use provisioned concurrency environments.

Upvotes: 0

notAChance
notAChance

Reputation: 1430

I'm assuming you are going by the Init Duration log in your lambda?

If so, what is likely happening, is that the alias version which you are hitting (assuming you have configured everything correctly and are indeed hitting a provisioned concurrency alias version) has actually spun up a considerable amount of time before you hit the version. Then your request comes in, the lambda executes and shuts down. Then the init duration time shows in the last log line. This is the init taken for that version, which was provisioned and spun up a significant time before you actually called it.

If you are hitting your lambda's alias version, you are not experiencing cold starts - your performance hit is coming from somewhere else.

Upvotes: 0

Marco Roy
Marco Roy

Reputation: 5273

tl;dr:


We're experiencing the same issue, and haven't found any way around it. In the end, Lambda instances are transient, so there's no guaranteed continuous uptime (even with provisioned concurrency).

What provisioned concurrency does give you though is the guarantee of a number of running instances - although these can be swapped with other instances at any point in time (and incur a cold start when that happens). The frequency of the swaps seems pretty arbitrary, and I assume, completely up to AWS.

EDIT: We eventually realized this isn't an issue at all! It just has to do with the nature of how provisioned concurrency functions:

  • With provisioned concurrency, initialization/cold starts still happen, but they happen before the Lambda is made available to be invoked.

    • You can tell this is the case when setting provisioned concurrency to 1 and looking at the logs -- the new instance will be initialized while the previous instance is still being invoked, and then the new instance will start being invoked while the previous instance is discarded.
    • Which means that effectively, clients are not experiencing these cold starts! It's just happening in the background. 👍
  • However, clients could still experience another form of cold start if the lambda function doesn't makes good usage of the static initialization -- which is often slower than the Lambda initialization itself:

In our analyses of Lambda performance across production invocations, data shows that the largest contributor of latency before function execution comes from INIT code.

All that is summed up pretty well by the images below (from the Lambda Performance Optimization Guide).

  • The first two are without provisioned concurrency:

cold starts without provisioned concurrency

cold starts without provisioned concurrency

  • The other two are with provisioned concurrency enabled:

cold starts with provisioned concurrency

cold starts with provisioned concurrency


A good way to tell if Lambda is indeed doing a cold start is to look at the logs in CloudWatch. Each request should have a REPORT log that looks like this:

REPORT RequestId: f840a316-cf35-42ec-8f4d-c03a6cde9192  Duration: 368.80 ms Billed Duration: 369 ms Memory Size: 128 MB Max Memory Used: 93 MB  Init Duration: 3569.10 ms

If you see Init Duration at the end of the log, then it is indeed a cold start. However, with provisioned concurrency, this init duration happened before the Lambda was invoked.

Also, a new CloudWatch log stream seems to be created each time AWS spins up a new Lambda "instance" - which incurs a cold start, confirmed by the fact that the first request of each log stream has an Init Duration. So just taking a look at the "First event time" column will show you all your cold starts (the column can be added via the preferences/gear icon).

This is further confirmed in the Lambda Performance Optimization Guide:

Initialization code is run more frequently than the total number of invocations. Since Lambda is highly available, for every one unit of Provisioned Concurrency, there are a minimum of two execution environments prepared in separate Availability Zones. This is to ensure that your code is available in the event of a service disruption. As environments are reaped and load balancing occurs, Lambda over-provisions environments to ensure availability. You are not charged for this activity. If your code initializer implements logging, you will see additional log files anytime that this code is run, even though the main handler is not invoked.


It can also be a good idea to look at the START log, to make sure that the intended version is being called (the one with provisioned concurrency configured):

START RequestId: f840a316-cf35-42ec-8f4d-c03a6cde9192   Version: 15

It's especially important to make sure that the version is not $LATEST (which cannot benefit from provisioned concurrency):

Each version of a function can only have one provisioned concurrency configuration. This can be directly on the version itself, or on an alias that points to the version. Two aliases can't allocate provisioned concurrency for the same version. Also, you can't allocate provisioned concurrency on an alias that points to the unpublished version ($LATEST).

Upvotes: 23

Rafi
Rafi

Reputation: 2494

A colleague of mine ran a test to figure out what is going on here and the cloudwatch logs are misleading. When you have provisionedConcurrency and you see Init Duration it doesn't mean that it actually took that amount of extra time but what it would have been if there wasn't provisionedConcurrency. I know that's counterintuitive but that's what the test showed.

Test Setup

  1. Lambda #1 - Has ProvisionConcurency = 1
  2. Lambda #2 - Executes Lambda #1 and log the in this lambda how long the execution of lambda #1 took.
  3. Execute Lambda #2 when both lambda #1 and lambda #2 have been idle for a long time too make sure it will trigger a cold start.

Test Result

Lambda #1 Cloud Watch Logs:

2021-06-11T12:09:22.427+03:00   START RequestId: 8f90de41-3c2b-4baf-b843-99173d5862ba Version: 7
2021-06-11T12:09:22.600+03:00   Lambda #1 request: {"Key1":null,"Key2":null,"Key3":null}
2021-06-11T12:09:22.617+03:00   END RequestId: 8f90de41-3c2b-4baf-b843-99173d5862ba
2021-06-11T12:09:22.618+03:00   REPORT RequestId: 8f90de41-3c2b-4baf-b843-99173d5862ba Duration: 189.24 ms Billed Duration: 190 ms Memory Size: 256 MB Max Memory Used: 110 MB Init Duration: 5079.01 ms

Lambda #2 Cloud Watch Logs:

2021-06-11T12:09:21.861+03:00   START RequestId: 3cf51d5a-816b-4319-8db9-c9fee88e3e09 Version: $LATEST
2021-06-11T12:09:22.177+03:00   Lambda#2 request: {"Key1":"value1","Key2":"value2","Key3":"value3"}
2021-06-11T12:09:22.624+03:00   Lambda#1 time taken: 00:00:00.4294049
2021-06-11T12:09:22.635+03:00   END RequestId: 3cf51d5a-816b-4319-8db9-c9fee88e3e09
2021-06-11T12:09:22.635+03:00   REPORT RequestId: 3cf51d5a-816b-4319-8db9-c9fee88e3e09 Duration: 772.90 ms Billed Duration: 773 ms Memory Size: 256 MB Max Memory Used: 109 MB Init Duration: 837.07 ms

Notice: Lambda#1 time taken: 00:00:00.4294049 but the the init duration for lambda #1 is 5079.01 ms. So provisionedConcurrency works like it should just the cloudwatch logs are deceiving.

Upvotes: 2

BAD_SEED
BAD_SEED

Reputation: 5056

Are you sure these are really cold start latencies rather than problem with database connection? Have you think of using X-Ray for tracing? You could wrap the instruction you want to mesure inside a segment.

Here an example application.

Upvotes: 2

Related Questions