Using Polly v8 and RestSharp, how would I build a Generic ResiliencePipeline to account for response header retry-after, exceptions, and logging

Question

I have an API client class built using _client = new RestSharp.RestClient(...); with methods of this form:

public async Task PostAsync(string resource, TRequest payload) where TRequest : class
{
    var request = new RestRequest(resource);
    request.AddJsonBody(payload);
    var response = await PolicyFactory.RestSharpPolicy>(_logger).ExecuteAsync(
        async token => await _client.ExecuteAsync>(request, Method.Post, token));
    LogAndRaiseErrors(request, response);
    return response.Data.Data;
}

Here's the LogAndRaiseErrors method:

protected void LogAndRaiseErrors(RestRequest request, RestResponse response)
{
    if (response.StatusCode != HttpStatusCode.OK)
    {
        _logger.LogError("API Error:{response}", SerializeResponse(response));
        throw new BigCommerceException(response.Content);
    }
    _logger.LogDebug("B2B API Call:
{response}", SerializeResponse(response));
}

I've had a look at the Polly documentation, but it is a bit sparse.
How would I construct the ResiliencePipeline used in the PostAsync method to achieve the following primary goals:

Read the response.Headers.FirstOrDefault(h=>h.Name?.ToLower() == "retry-after")?.Value and delay for the specified seconds, which the existing code does not do at all.
I Believe retry-after should affect all threads which may be calling the given API? add Circuit Breaker?

Any pointers on secondary goals would be great:

Log every request (currently done in LogAndRaiseErrorsMethod)
Log when retries occur including the delay specified in the header.
Move the LogAndRaiseErrors(...) into the Policy so the policy can deal with logging and any exceptions
Add Policy to retry for network failures/timeouts etc.
I suspect we need some jitter (is that what that's for?) so not all threads pile on the API at once?

UPDATE here's what I tried:
I think I have retry working correctly, but not circuit breaker. I moved pipeline creation to the constructor so there is only one pipeline (logging shows it's called once)

public ApiClient(..., ILogger logger)
{
    ...
    _client = client;
    _resiliencePipeline = PolicyFactory.GetRestSharpPolicy(_logger);
}

public async Task GetAsync(string resource)
{...}

public async Task PostAsync(string resource, TRequest payload) where TRequest : class
{
    var request = new RestRequest(resource);
    request.AddJsonBody(payload);
    var response = await _resiliencePipeline.ExecuteAsync(
        async token => await _client.ExecuteAsync>(request, Method.Post, token));
    LogAndRaiseErrors(request, response);
    return response.Data.Data;
}

Here's my pipeline creation, but I never see any Circuit Breaker log entries, but I do see a lot of Retry log entries for Retry.

public static class PolicyFactory
{

    public static ResiliencePipeline GetRestSharpPolicy(ILogger logger)
    {
        logger.LogInformation("Building ResiliencePipeline");
        return new ResiliencePipelineBuilder()
            .AddCircuitBreaker(new CircuitBreakerStrategyOptions
            {
                FailureRatio = 0,
                ShouldHandle = new PredicateBuilder()
                    .HandleResult(static result => result.StatusCode == HttpStatusCode.TooManyRequests),
            
                OnOpened = args =>
                {
                    logger.LogWarning("Circuit Breaker Opened on {StatusCode} for {Duration}s ({ResponseUri})",
                        args.Outcome.Result.StatusCode, args.BreakDuration.TotalSeconds, args.Outcome.Result.ResponseUri);
                    return ValueTask.CompletedTask;
                },
                OnClosed = args =>
                {
                    logger.LogWarning("Circuit Breaker Closed on {StatusCode} ({ResponseUri})",
                        args.Outcome.Result.StatusCode, args.Outcome.Result.ResponseUri);
                    return ValueTask.CompletedTask;
                }
            })
            .AddRetry(new RetryStrategyOptions
            {
                ShouldHandle = new PredicateBuilder()
                    .HandleResult(static result => result.StatusCode == HttpStatusCode.TooManyRequests),
                DelayGenerator = delayArgs =>
                {
                    var retryAfter = delayArgs.Outcome.Result.Headers.FirstOrDefault(h => h.Name?.ToLower() == "retry-after")?.Value.ToString();
                    return int.TryParse(retryAfter, out var seconds)
                        ? new ValueTask(TimeSpan.FromSeconds(seconds))
                        : new ValueTask(TimeSpan.FromSeconds(0.5));
                },
                MaxRetryAttempts = 5,
                OnRetry = args =>
                {
                    logger.LogWarning("Retry Attempt:{Attempt} Delay:{Delay}",
                        args.AttemptNumber, args.RetryDelay);
                    return ValueTask.CompletedTask;
                }
            })
            .Build();
    }
}

Peter Csala · Accepted Answer

There are a couple of tiny things that need to be fixed to make it work.

Registration order

There is a huge difference between the following two pipelines

var pipeline = new ResiliencePipelineBuilder()
    .AddCircuitBreaker( new() { ... }) // outer
    .AddRetry(new() { ... }) // inner
    .Build();

var pipeline = new ResiliencePipelineBuilder()
    .AddRetry(new() { ... }) // outer
    .AddCircuitBreaker( new() { ... }) // inner
    .Build();

There is a concept called escalation. If the inner strategy does not handle the response/exception then it passes the outcome to next strategy in the chain.

So, if you define your pipeline that the inner strategy handles the 429 status code then the outer will never trigger. That's why you have to change the order and move the CB to be the innermost.

We have created a dedicated section on pollydoc to illustrate the differences between the registration orders.

Circuit Breaker settings

Please bear in mind that the default values for CB are sensible for production load, not for testing.

Property	Default Value	Description
`FailureRatio`	0.1	The failure-success ratio that will cause the circuit to break/open. 0.1 means 10% failed of all sampled executions.
`MinimumThroughput`	100	The minimum number of executions that must occur within the specified sampling duration.
`SamplingDuration`	30 seconds	The time period over which the failure-success ratio is calculated.

This means during 30 seconds you should have at least 100 requests with at least 10 failures to open the circuit.

The FailureRatio should be read like this

0: A single failed request is enough to open the circuit
1: 100 requests should fail to open the circuit
0.2: 20 requests should fail to open the circuit

I'm not sure that 0 was intentional or not. If not then I would suggest to play with this code (with different FailureRatio values and different greater than values) to better understand how CB works:

var pipeline = new ResiliencePipelineBuilder()
    .AddCircuitBreaker(new CircuitBreakerStrategyOptions
    {
        FailureRatio = 1, //0, 0.2, 0.5
        ShouldHandle = new PredicateBuilder().HandleResult(result => result > 100), // 50, 100, 150, 190
        OnOpened = static args => { Console.WriteLine("Opened"); return default; },
        OnClosed = static args => { Console.WriteLine("Closed"); return default; }
    }).Build();

for(int i = 1; i < 200; i++)
{
    try
    {
       _ = pipeline.Execute(() => i);
    }
    catch(BrokenCircuitException)
    {
        Console.WriteLine($"{i}: BCE");
    }
}

Retry after

In your solution you are using the RetryAfter header only for the retry. But you could use that for the Circuit Breaker as well.

By default the CB breaks for 5 seconds. But you can set the break duration dynamically via the BreakDurationGenerator.

That would shortcut any outgoing requests with a BrokenCircuitException until the RetryAfter is not elapsed. That would require some change in your Retry strategy to handle BrokenCircuitException as well.

Please also bear in mind RetryAfter could contain not just delta value but also date time. That means your TimeSpan.FromSeconds could fail if the retry after was set like this: Retry-After: Wed, 10 Jul 2024 09:35:00 GMT

UPDATE #1

I'm not sure how I should retrieve and share the delay, I don't have access to args.Outcome in the BreakDurationGenerator

Hmmm, that's strange. I thought we have added the Outcome to the BreakDurationGeneratorArguments. ~~Gladly that seems an easy fix I'll fly a PR for that this week.~~ Adding the Outcome in a backward compatible way seems quite challenging. I'm looking for another alternative.

What is the best way to inform Retry the same delay that the CircuitBreaker has.... Should I ditch the circuit breaker and just use retry? am I overcomplicating things?

The suggested way is to use the Context to exchange information between multiple strategies. Here you can see the details.

Using Polly v8 and RestSharp, how would I build a Generic ResiliencePipeline to account for response header retry-after, exceptions, and logging

Answers (1)

Registration order

Circuit Breaker settings

Retry after

Related Questions