Reputation: 495
I'm using managed CloudRun to deploy a container with concurrency=1
. Once deployed, I'm firing four long-running requests in parallel.
Most of the time, all works fine -- But occasionally, I'm facing 500's from one of the nodes within a few seconds; logs only provide the error message provided in the subject.
Using retry with exponential back-off did not improve the situation; the retries also end up with 500s. StackDriver logs also do not provide further information.
Potentially relevant gcloud beta run deploy
arguments:
--memory 2Gi --concurrency 1 --timeout 8m --platform managed
What does the error message mean exactly -- and how can I solve the issue?
Upvotes: 36
Views: 25061
Reputation: 777
This error can be business as usual for cloud run during scaling.
During scaling up, GCP networking stack routes your request to the cold starting instance even though it hasn't passed its' health check yet. So client of the request is left hanging for the duration of cold start + duration of the request.
This is suboptimal, since you might have existing cloud run instance resources that could serve the request immediately. Ideally there should be no routing to cold-starting instances if any current instances aren't too overloaded.
Load balancer keeps the client waiting until cold start + request is finished. These error messages pop up at client timeout. Timeout happens on combination of load_balancer timeout, cloud run service timeout, client timeout, GCP infra timeout (10 sec?). On timeout, load balancer says response_sent_by_backend
status 500. Even though your "backend" instance aka your container never got the request due to networking.
For me the main problem is, why are cloud run instances scaling in scenarios when they shouldn't be according to docs?
Based on autoscaling logic which is brought out here and here. You might have 0 reason for cloud run to scale up, but it suddenly might scale up.
e.g.
For these log cases where duration is displayed, it's important to look at receiveTimestamp of log vs timestamp of log. Timestamp is time of request arrived, receiveTimestamp is time of response sent.
Upvotes: 0
Reputation: 499
This error can be caused by one of the following.
We have faced similar issue sporadically and it was due to a long request processing time when DB latencies are high for few requests.
Upvotes: 1
Reputation: 2280
Setting the Max Retry Attempts
to anything but zero will remedy this, as it did for me.
Upvotes: 0
Reputation: 354
We also faced this issue when traffic suddenly increased during business hours. The issue is usually caused by a sudden increase in traffic and a longer instance start time to accommodate incoming requests. One way to handle this is by keeping warm-up instances always running i.e. configuring --min-instances parameters in the cloud run deploy command. Another and recommended way is to reduce the service cold start time (which is difficult to achieve in some languages like Java and Python)
Upvotes: 10
Reputation: 3132
I was able to resolve this on my service by raising the max autoscaling container count from 2 to 10. There really should be no reason that 2 would be even close to too low for the traffic, but I suspect something about the Cloud Run internals were tying up to 2 containers somehow.
Upvotes: 2
Reputation: 75765
I also experiment the problem. Easy to reproduce. I have a fibonacci container that process in 6s fibo(45). I use Hey to perform 200 requests. And I set my Cloud Run concurrency to 1.
Over 200 requests I have 8 similar errors. In my case: sudden traffic spike and long processing time. (Short cold start for me, it's in Go)
Upvotes: 6
Reputation: 7909
This error message can appear when the infrastructure didn't scale fast enough to catch up with the traffic spike. Infrastructure only keeps a request in the queue for a certain amount of time (about 10s) then aborts it.
This usually happens when:
Upvotes: 21