diegobarriosdev
diegobarriosdev

Reputation: 433

Unexpected behaviour using nested Retry, and Circuit Breaker policies of Polly.Net

I coded a resilience strategy based on retry, and a circuit-breaker policies. Now working, but with a issue in its behavior.

I noticed when the circuit-breaker is on half-open, and the onBreak() event is being executed again to close the circuit, one additional retry is triggered for the retry policy (this one aside of the health verification for the half-open status).

Let me explain step by step:

I've defined two strongly-typed policies for retry, and circuit-breaker:

static Policy<HttpResponseMessage> customRetryPolicy;
static Policy<HttpResponseMessage> customCircuitBreakerPolicy;

static HttpStatusCode[] httpStatusesToProcess = new HttpStatusCode[]
{
   HttpStatusCode.ServiceUnavailable,  //503
   HttpStatusCode.InternalServerError, //500
};

Retry policy is working this way: two (2) retry per request, waiting five (5) second between each retry. If the internal circuit-breaker is open, must not retry. Retry only for 500, and 503 Http statuses.

customRetryPolicy = Policy<HttpResponseMessage>   

//Not execute a retry if the circuit is open
.Handle<BrokenCircuitException>( x => 
{
    return !(x is BrokenCircuitException);
})

//Stop if some inner exception match with BrokenCircuitException
.OrInner<AggregateException>(x =>
{
    return !(x.InnerException is BrokenCircuitException);
})

//Retry if status are:
.OrResult(x => { return httpStatusesToProcess.Contains(x.StatusCode); })

// Retry request two times, wait 5 seconds between each retry
.WaitAndRetry( 2, retryAttempt => TimeSpan.FromSeconds(5),
    (exception, timeSpan, retryCount, context) =>
    {
        System.Console.WriteLine("Retrying... " + retryCount);
    }
);

Circuit-breaker policy is working in this way: Allow max three (3) failures in a row, next open the circuit for thirty (30) seconds. Open circuit ONLY for HTTP-500.

customCircuitBreakerPolicy = Policy<HttpResponseMessage>

// handling result or exception to execute onBreak delegate
.Handle<AggregateException>(x => 
    { return x.InnerException is HttpRequestException; })

// just break when server error will be InternalServerError
.OrResult(x => { return (int) x.StatusCode == 500; })

// Broken when fail 3 times in a row,
// and hold circuit open for 30 seconds
.CircuitBreaker(3, TimeSpan.FromSeconds(30),
    onBreak: (lastResponse, breakDelay) =>{
        System.Console.WriteLine("\n Circuit broken!");
    },
    onReset: () => {
        System.Console.WriteLine("\n Circuit Reset!");
    },
    onHalfOpen: () => {
        System.Console.WriteLine("\n Circuit is Half-Open");
    }); 

Finally, those two policies are being nested this way:

try
{
    customRetryPolicy.Execute(() =>
    customCircuitBreakerPolicy.Execute(() => {
       
       //for testing purposes "api/values", is returning 500 all time
        HttpResponseMessage msResponse
            = GetHttpResponseAsync("api/values").Result;
        
        // This just print messages on console, no pay attention
        PrintHttpResponseAsync(msResponse); 
        
        return msResponse;

   }));
}
catch (BrokenCircuitException e)
{
    System.Console.WriteLine("CB Error: " + e.Message);
}

What is the result that I expected?

  1. The first server responses is HTTP-500 (as expected)
  2. Retry#1, Failed (as expected)
  3. Retry#2, Failed (as expected)
  4. As we have three faults, circuit-breaker is now open (as expected)
  5. GREAT! This is working, perfectly!
  6. Circuit-breaker is open for the next thirty (30) seconds (as expected)
  7. Thirty seconds later, circuit-breaker is half-open (as expected)
  8. One attempt to check the endpoint health (as expected)
  9. Server response is a HTTP-500 (as expected)
  10. Circuit-breaker is open for the next thirty (30) seconds (as expected)
  11. HERE THE PROBLEM: an additional retry is launched when the circuit-breaker is already open!

Look at the images:

enter image description here

enter image description here

enter image description here

I'm trying to understand this behavior. Why one additional retry is being executed when the circuit-breaker is open for second, third,..., N time?

I've reviewed the machine state model for the retry, and circuit-breaker policies, but I don't understand why this additional retry is being performed.

Flow for the circuit-breaker: https://github.com/App-vNext/Polly/wiki/Circuit-Breaker#putting-it-all-together-

Flow for the retry policy: https://github.com/App-vNext/Polly/wiki/Retry#how-polly-retry-works

This really matter, because the time for the retry is being awaited (5 seconds for this example), and at the end, this is a waste of time for high-concurrency.

Any help / direction, will be appreciated. Many thanks.

Upvotes: 2

Views: 4639

Answers (2)

diegobarriosdev
diegobarriosdev

Reputation: 433

I tried this same scenario with this sample provided by polly Async Demo06_WaitAndRetryNesting CircuitBreaker.cs. I found it in Polly-Samples/Polly TestClient/Samples/.

Look sample here: Official Samples provided by Polly

The execution confirmed me that this behavior does not happen only on the sample provided by me.

enter image description here

Later of this confirmation, I added to the Retry Policies evaluation, one additional condition to Retry depending of the Circuit Breaker state. This works!

Pay attention to new condition on .OrResult() delegate

 customRetryPolicy = Policy<HttpResponseMessage>   

 .Handle<BrokenCircuitException>( 
    x => { return !(x is BrokenCircuitException); })

 .OrInner<AggregateException>(
    x => { return !(x.InnerException is BrokenCircuitException); })

//Retry if HTTP-Status, and Circuit Breake status are:
.OrResult(x => { 
    return httpStatusesToProcess.Contains(x.StatusCode) 

    //This condition evaluate the current state for the 
    //circuit-breaker before each retry
        && ((CircuitBreakerPolicy<HttpResponseMessage>) 
        customCircuitBreakerPolicy).CircuitState == CircuitState.Closed
    ;
})
.WaitAndRetry( 2, retryAttempt => TimeSpan.FromSeconds(1),
    (exception, timeSpan, retryCount, context) =>
    {
        System.Console.WriteLine("Retrying... " + retryCount);
    }
);

This is the result:

enter image description here

My assumption it was that Retry Policy (the most outer policy) was able to control if retry or not, checking the Circuit Breaker status. And this happen, BUT for some reason, when the Circuit-Breaker is half-open, and the health request is performed with a negative response, during the transition from half-open to closed, the penalty of one try is executed (just like @peter-csala said before).

I forced to evaluate the circuit-breaker state when this happen. But I considerer that Polly should be perform this for itself.

Upvotes: 2

Peter Csala
Peter Csala

Reputation: 22829

With Polly.Context you can exchange information between the two policies (in your case: Retry and Circuit Breaker). The Context is basically a Dictionary<string, object>.

So, the trick is to set a key on the onBreak then use that value inside the sleepDurationProdiver.

Let's start with inner Circuit Breaker policy:

static IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return Policy<HttpResponseMessage>
        .HandleResult(res => res.StatusCode == HttpStatusCode.InternalServerError)
        .CircuitBreakerAsync(3, TimeSpan.FromSeconds(2),
           onBreak: (dr, ts, ctx) => { ctx[SleepDurationKey] = ts; },
           onReset: (ctx) => { ctx[SleepDurationKey] = null; });
}
  • It breaks after 3 subsequent failed requests
  • It stays in Open state for 2 seconds before it transits to HalfOpen
  • It sets a key on the context with the durationOfBreak value
  • When the CB goes back to "normal" Closed state (onReset) it deletes this value

Now, let's continue with the Retry policy:

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return Policy<HttpResponseMessage>
    .HandleResult(res => res.StatusCode == HttpStatusCode.InternalServerError)
    .Or<BrokenCircuitException>()
    .WaitAndRetryAsync(4,
        sleepDurationProvider: (c, ctx) =>
        {
            if (ctx.ContainsKey(SleepDurationKey))
                return (TimeSpan)ctx[SleepDurationKey];
            return TimeSpan.FromMilliseconds(200);
        },
        onRetry: (dr, ts, ctx) =>
        {
            Console.WriteLine($"Context: {(ctx.ContainsKey(SleepDurationKey) ? "Open" : "Closed")}");
            Console.WriteLine($"Waits: {ts.TotalMilliseconds}");
        });
}
  • It triggers either when the StatusCode is 500
    • or when there was an BrokenCircuitException
  • It triggers at most 4 times (so, all in all 5 attempts)
  • It sets the sleep duration based on the context
    • If the key is not present in the context (CB is in Open state) then it returns with 200 milliseconds
    • If the key is present in the context (CB is not in Open state) then it returns with value from the context
      • NOTE: You can add a few hundred milliseconds to this value to avoid race condition
  • It prints some values to the console inside onRetry only for debugging purposes

Finally let's wire up the policies and test it

const string SleepDurationKey = "Broken"; 
static HttpClient client = new HttpClient();
static async Task Main()
{
    var strategy = Policy.WrapAsync(GetRetryPolicy(), GetCircuitBreakerPolicy());
    await strategy.ExecuteAsync(async () => await Get());
}

static Task<HttpResponseMessage> Get()
{
    return client.GetAsync("https://httpstat.us/500");
}
  • It uses the http://httpstat.us website to emulate an overloaded downstream
  • It combines/chains the two policies (CB inner, Retry outer)
  • It calls the Get method in an asynchronous way

when handledEventsAllowedBeforeBreaking is 2

The output

Context: Closed
Waits: 200
Context: Open
Waits: 2000
Context: Open
Waits: 2000
Context: Open
Waits: 2000

when handledEventsAllowedBeforeBreaking is 3

The output

Context: Closed
Waits: 200
Context: Closed
Waits: 200
Context: Open
Waits: 2000
Context: Open
Waits: 2000

when handledEventsAllowedBeforeBreaking is 4

The output

Context: Closed
Waits: 200
Context: Closed
Waits: 200
Context: Closed
Waits: 200
Context: Open
Waits: 2000

Upvotes: 7

Related Questions