Reputation: 4154
Consider a software system that consists of multiple web services (perhaps microservices) communicating with one another. In the course of a certain use case, a client sends a POST web request to an external service A, which in turn needs to send a POST web request to an internal service B.
In case service B is currently unavailable and, e.g., returns HTTP 503, service A will perform internal retries for some time to try and cope with the outage of its dependency.
Now, if the retries don't fix the problem, what is the best practice for service A? Should it also return 503 to indicate unavailability? Or perhaps 502, as it "acted as a gateway"?
And what should the client do? This might easily cause a retry cascade. OTOH, many HTTP client libraries (e.g., Microsoft.Extensions.Http.Resilience) will retry on any 5xx error, so maybe it doesn't really matter anyway?
I hope this question won't be judged as being opinion-based - surely, with service-oriented and microservices architectures having been around for some time, there must be some standard patterns for this common scenario?
(Of course, service A should ideally not synchronously depend on service B - asynchronous messaging is king with regards to resilience. However, let's just assume that this is a synchronous use case and it's not feasible to remodel it as an asynchronous one. And as it's POST requests with desired side effects, we cannot just throw caching or data replication at the problem, either.)
Upvotes: 1
Views: 237
Reputation: 22829
There is no de facto standard status code for this situation as far as I know. Generally speaking the following HTTP statuses are usually considered retriable ones because they represent some transient issue:
It is also worth mentioning that RetryAfter
header is usually used only with these status codes:
The 301 gently asks the client to wait a bit before performing the redirection.
The 429 is usually used for Throttling / Rate limiting scenarios
Both 429 and 503 with RetryAfter
header are considered as back-pressure. Please back off a bit because I'm either overloaded or self-healing or ...
502 and 504 could also be considered as valid status codes for this scenario. Since your Service A acts as a proxy and both status codes say that during the request processing there were some upstream problem:
And what should the client do? This might easily cause a retry cascade.
For this situation the suggested solution is the usage of circuit breaker to prevent cascading failure.
Upvotes: 0