Chris
Chris

Reputation: 1660

Why isn't Google Cloud Run performing the startup probe after scaling?

I have written a Rust server that runs inside a Docker container on Google Cloud Run. The server receives infrequent requests and immediately responds with a 200 status code acknowledgement. It then runs an asynchronous background job and sends a callback request once it is done. The server queues background jobs and runs one at a time.

While the job runs, it requests its own /ping endpoint to keep the instance alive so that Cloud Run does not scale to 0 instances. Upon receiving a SIGINT the server immediately exits. This workflow appears to be correct and I can see that the instance is kept alive while the background job is running.

I have configured Cloud Run as follows:

After deploying and testing this scaling up/down process, everything seems to work correctly. However, if several hours passes with no activity and then a new request arrives, the instance does not seem to scale from 0 to 1 instances correct. Normally, when the server starts, I see the following in the logs:

POST https://my-service-tichlvfbva-nw.a.run.app/run-background-job
[1694802575] Monitor thread started.
[1694802575] Worker thread started.
[1694802575] Server started at http://0.0.0.0:8080.
INFO 2023-09-15T18:29:35.936759Z Default STARTUP TCP probe succeeded after 1 attempt for container "my-service-1" on port 8080.
[1694802576] Running background job...
GET https://my-service-nw.a.run.app/ping
GET https://my-service-nw.a.run.app/ping
GET https://my-service-nw.a.run.app/ping
[1694802578] Sending callback HTTP request.
[1694802578] Job finished.
[1694803476] SIGINT received. Exiting.

However, after several hours of inactivity, the startup TCP check does not appear to run. The logs show:

POST https://my-service-tichlvfbva-nw.a.run.app/run-background-job
Container terminated on signal 4.

The POST request fails with 503 Service Unavailable and all subsequent POSTs fail with the same error. I then have to re-deploy the service for it to start working again for several hours until it becomes unavailable again. I don't understand why I am getting a SIGILL signal (4) and I don't understand why the startup probe isn't running.

The background job does run some AVX2 and AVX512 instructions but the program checks at runtime if these are available on the target platform. The server doesn't seem to be getting that far, anyway, since it doesn't log out the 'Monitor thread started.' line which occurs before any request processing. I'm very confused as to what is going wrong.

I haven't tried the following steps yet because I'd like to understand what's going wrong, first. But thought I might try:

  1. Switching to 'First generation' instances (maybe they're more reliable?)
  2. Replacing the TCP probe with an HTTP probe (maybe it will always run, in that case?)
  3. Setting the minimum number of instances to 1 to prevent scaling down (this will cost more money)

Edit: As requested in the comments, here is the deploy command. I'm not using a service.yml.

gcloud run deploy my-service \
--image=europe-west2-docker.pkg.dev/my-service-398017/docker/my_service:${{ github.ref_name }} \
--region=europe-west2 \
--allow-unauthenticated \
--command=./server \
--args=--port,8080 \
--execution-environment=gen2 \
--min-instances=0 \
--max-instances=1 \
--cpu=8 \
--memory=4Gi \
--no-cpu-throttling

Upvotes: 2

Views: 6742

Answers (1)

Rohit Kharche
Rohit Kharche

Reputation: 2919

While the job runs, cloud run checks for the activeness of the service via TCP/HTTP probe. But when the service becomes idle for 15 minutes it automatically scales down to the min-instances : 0. In your case TCP probe needs an active instance to do probe checks. Hence when the service scales back up the TCP probe does not happen.

The default timeout for scaling down to zero is 15 minutes, but it can be configured. Cloud Run may keep some instances idle for up to 15 minutes to minimize the impact of cold starts. And in that time services will scale down.

This means that when a new request arrives after several hours, Cloud Run needs to provision a new instance before it can serve the request. This can take a few seconds, which is why you are seeing a 503 Service Unavailable error.

Possible Solutions :

  • Keep at least 1 minimum instance alive : to successfully do the probe checks. Although this will also increase your costs, as you will be charged for the running instance even if it is not serving any traffic.

  • Also implement a HTTP Probe : A TCP probe is only used to check if the container is listening on the specified port. But an HTTP probe will make an actual HTTP request to your service to check if it is healthy, even if your service is not receiving any traffic.

The best option will be to use HTTP probe as it does not require any minimum instance as those will be routed over the load balancers. The load balancer can always probe the endpoint of your service, even if there are no instances running.

Here are my answers for the questions you have raised :

Switching to 'First generation' instances (maybe they're more reliable?)

No, I don't think it will be a good option.As gen2 has some advantages compared to gen1.

Replacing the TCP probe with an HTTP probe (maybe it will always run, in that case?)

Yes, As i mentioned above Http prove should do the trick for you.

Setting the minimum number of instances to 1 to prevent scaling down (this will cost more money)?

Yes,It is costly compared to your current setup.You can refer to doc for more details or pricing.

Reference :

Upvotes: 0

Related Questions