broski2022
broski2022

Reputation: 55

Configuring retry policy for grpc request

I was trying to configure a retry policy from the client side for for some grpc services but it's not behaving the way I expect it to behave so I might be misunderstanding how retry policy works in grpc or there's a mistake in the policy. Here's the policy:

var retryPolicy = `{
        "methodConfig": [{
            "name": [{"service": "serviceA"}, {"service":"serviceB"}],
            "timeout":"30.0s",
            "waitForReady": true,
            "retryPolicy": {
                "MaxAttempts": 10,
                "InitialBackoff": ".5s",
                "MaxBackoff": "10s",
                "BackoffMultiplier": 1.5,
                "RetryableStatusCodes": [ "UNAVAILABLE", "UNKNOWN" ]
            }
        }]
    }`

What I expected was that if the client's grpc request to a method defined in one the services(serviceA or serviceB) failed then I expect a retry and since waitForReady is true the client will block the call until a connection is available (or the call is canceled or times out) and will retry the call if it fails due to a transient error. But when I purposefully down the server which this request is going to. The client gets an Unavailable grpc status code and error is: Error while dialing dial tcp xx.xx.xx.xx:xxxx: i/o timeout but the client didn't get this error message 30 seconds later, instead received this error right away. Could the reason be because of how I'm giving the service names? Does it need the path of the file where the service is defined? For a bit more context, the grpc service is defined in another package which the client imports. Any help would be greatly appreciated.

Upvotes: 0

Views: 13688

Answers (2)

Archimedes Trajano
Archimedes Trajano

Reputation: 41672

When building the ManagedChannel using ManagedChannelBuilder add the following:

builder
  .enableRetry()
  .disableServiceConfigLookUp() // since we're setting it via code
  .defaultServiceConfig(buildServiceConfig())

The service config should look like this

{
  "methodConfig" : [ {
    "name" : [ {
      "service" : ""
    } ],
    "retryPolicy" : {
      "maxBackoff" : "5.0s",
      "maxAttempts" : 5.0,
      "retryableStatusCodes" : [ "UNAVAILABLE" ],
      "backoffMultiplier" : 2.0,
      "initialBackoff" : "0.1s"
    },
    "waitForReady" : true
  } ],
  "loadBalancingConfig" : [ {
    "weighted_round_robin" : { }
  }, {
    "round_robin" : { }
  }, {
    "pick_first" : {
      "shuffleAddressList" : true
    }
  } ]
}

The two key things that is a bit hard to miss are

"name" : [
  {
    "service" : ""
  }
]

This being the first one and have no method sets the retryPolicy to all services by default for the channel. You don't need to explicitly specify the service name.

The second one is

"waitForReady" : true

That tells the channel that it must first wait for the service to be ready before attempting to send. Without this set to true, the retry mechanism does not work presumably because it assumes that the message has been sent in which case it won't retry even if the service returns UNAVAILABLE.

So you can create a simple Spring wrapper like this

@Component
public class ManagedChannelBuilderWrapper {

  @Value("${grpc.retry.backoffMultiplier:2.0}")
  private double retryBackoffMultiplier;

  @Value("${grpc.retry.initialBackoff:100ms}")
  private Duration retryInitialBackoff;

  @Value("${grpc.retry.maxAttempts:5}")
  private int retryMaxAttempts;

  @Value("${grpc.retry.maxBackoff:5s}")
  private Duration retryMaxBackoff;

  public Map<String, Object> buildServiceConfig() {

    return Map.of(
        "loadBalancingConfig",
            List.of(
                Map.of("weighted_round_robin", Map.of()),
                Map.of("round_robin", Map.of()),
                Map.of("pick_first", Map.of("shuffleAddressList", true))),
        "methodConfig",
            List.of(
                Map.of(
                    "name", List.of(Map.of("service", "")),
                    "waitForReady", true,
                    "retryPolicy",
                        Map.of(
                            "maxAttempts",
                                (double) retryMaxAttempts,
                            "initialBackoff", durationToServiceConfigString(retryInitialBackoff),
                            "backoffMultiplier", retryBackoffMultiplier,
                            "maxBackoff", durationToServiceConfigString(retryMaxBackoff),
                            "retryableStatusCodes", List.of("UNAVAILABLE")))));
  }

  @NotNull private String durationToServiceConfigString(@NotNull Duration duration) {

    return (duration.toMillis() / 1000.0) + "s";
  }

  public ManagedChannelBuilder wrap(@NotNull ManagedChannelBuilder builder) {

    return builder
        .enableRetry()
        .disableServiceConfigLookUp()
        .defaultServiceConfig(buildServiceConfig());
  }
}

Note the reason for double for maxAttempts is the GRPC Map parser does not support Integer or Long

Upvotes: 4

broski2022
broski2022

Reputation: 55

Looking through the documentation, came across this link: https://github.com/grpc/grpc-proto/blob/master/grpc/service_config/service_config.proto and on line 72 it mentions

message Name {
    string service = 1;  // Required. Includes proto package name.
    string method = 2;
}

I wasn't adding the proto package name when listing the services. So the retry policy should be:

var retryPolicy = `{
        "methodConfig": [{
            "name": [{"service": "pkgA.serviceA"}, {"service":"pkgB.serviceB"}],
            "timeout":"30.0s",
            "waitForReady": true,
            "retryPolicy": {
                "MaxAttempts": 10,
                "InitialBackoff": ".5s",
                "MaxBackoff": "10s",
                "BackoffMultiplier": 1.5,
                "RetryableStatusCodes": [ "UNAVAILABLE", "UNKNOWN" ]
            }
        }]
    }`

where pkgA and pkgB are the proto package names.

Upvotes: 2

Related Questions