XII
XII

Reputation: 558

Random 503 / 504 Errors in Google Cloud Run (Java)

We are currently running a Java 17 app on Cloud Run and have encountered an unusual issue. While the service usually operates smoothly, a small percentage of requests (both GET and POST) fail unexpectedly.

These failed requests return either a 503 or 504 status, often appearing in pairs (which I observed today). Additionally, the failed requests share the same instanceID, and oddly, some successful requests are also associated with this instance. Meanwhile, the liveness probe is functioning correctly without any issues, despite customer-facing requests failing. The liveness probe checks our database, Redis connections, and other integrations, such as file storage connections.

The 503s include the following text payload:

The request failed because either the HTTP response was malformed or connection to the instance had an error. Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#malformed-response-or-connection-error

Another Spring Boot app, trying to access the API via a FeignClient, is receiving a feign.FeignException$ServiceUnavailable. I'm wondering if this could be related to a load balancer issue. Perhaps the health checks are passing correctly because they bypass the load balancer, but the actual requests are being affected by it?

Our CPU and memory usage are within reasonable limits, so I don't believe the issue is due to our instances being under-provisioned. Many of the failing requests are "simple" requests that typically respond in under 100ms.

Upvotes: 0

Views: 87

Answers (1)

J_Dubu
J_Dubu

Reputation: 169

In case you haven’t tried yet, please check the troubleshooting guide for recommended steps to rule out application side failure:

  • Check Cloud Logging

  • App-level timeouts

  • Downstream network bottleneck

  • Inbound request limit to a single container

Another thing to consider is investigating if there’s a mismatch in the location of your resources. This solution works here and could be useful to you (hopefully).

If the above options still won’t resolve it, this could be a Cloud Run specific issue and better addressed by the Google Cloud Support team. You may reach out to them via below channels:

Upvotes: 0

Related Questions