Reputation: 79
I'm currently seeing delays of 2-3 seconds on my first requests coming into our APIs. We've set the min instances to 1 to prevent cold start but this a delay is still occurring.
If I check the metrics I don't see any startup latencies in the specified timeframe so I have no insights in what is causing these delays. Tracing gives the following:
The only thing I can change, is switching to "CPU is always allocated" but this isn't helping in any way.
Can somebody give more information on this?
Upvotes: 1
Views: 1095
Reputation: 61
have you seen this thread?
Cold start in GCP API Gateway?
set a cloud scheduler that call your api gateway each 10 min might help.
Upvotes: 0
Reputation: 1142
As mentioned in the Answer :
As per doc :
Idle instances As traffic fluctuates, Cloud Run attempts to reduce the chance of cold starts by keeping some idle instances around to handle spikes in traffic. For example, when a container instance has finished handling requests, it might remain idle for a period of time in case another request needs to be handled.
Cloud Run But, Cloud Run will terminate unused containers after some time if no requests need to be handled. This means a cold start can still occur. Container instances are scaled as needed, and it will initialize the execution environment completely. While you can keep idle instances permanently available using the min-instance setting, this incurs cost even when the service is not actively serving requests.
So, let’s say you want to minimize both cost and response time latency during a possible cold start. You don’t want to set a minimum number of idle instances, but you also know any additional computation needed upon container startup before it can start listening to requests means longer load times and latency.
Cloud Run container startup There are a few tricks you can do to optimize your service for container startup times. The goal here is to minimize the latency that delays a container instance from serving requests. But first, let’s review the Cloud Run container startup routine.
When Starting the service
- Starting the container
- Running the entrypoint command to start your server
- Checking for the open service port
You want to tune your service to minimize the time needed for step 1a. Let’s walk through 3 ways to optimize your service for Cloud Run response times.
1. Create a leaner service
2. Use a leaner base image
3. Use global variables
As mentioned in the Documentation :
Background activity is anything that happens after your HTTP response has been delivered. To determine whether there is background activity in your service that is not readily apparent, check your logs for anything that is logged after the entry for the HTTP request.
Avoid background activities if CPU is allocated only during request processing
If you need to set your service to allocate CPU only during request processing, when the Cloud Run service finishes handling a request, the container instance's access to CPU will be disabled or severely limited. You should not start background threads or routines that run outside the scope of the request handlers if you use this type of CPU allocation. Review your code to make sure all asynchronous operations finish before you deliver your response.
Running background threads with this kind of CPU allocation can create unpredictable behavior because any subsequent request to the same container instance resumes any suspended background activity.
As mentioned in the Thread reason could be that all the operations you performed have happened after the response is sent.
According to the docs the CPU is allocated only during the request processing by default so the only thing you have to change is to enable CPU allocation for background activities.
You can refer to the documentation for more information related to the steps to optimize Cloud Run response times. You can also have a look on the blog related to use of Google API Gateway with Cloud Run.
Upvotes: 0