Xavier W.
Xavier W.

Reputation: 1360

Proper way to handle CosmosDb Mongo 429 error with Polly

I deploy an application on azure web apps who interract with a CosmosDB database using the mongoDb driver in .netCore 3.

Following this documentation I have to set a retry policy in order to handle the 429 error code when the RU/s are not available. I can't find a proper way to handle the policy with Polly given the fact that I haven't seen one the error display when the 429 error happends.

The only proper way to do it is by using the following Policy :

_retryPolicy = Policy
.Handle<MongoCommandException>(r => r.Message.Contains("Request rate is large"))
.WaitAndRetry(3, i => TimeSpan.FromSeconds(1));

And here is the use of the Polly policy :

   public long CountProjetByProjectNumber(string projectNumber)
    {
        long result = 0;
        _retryPolicy.Execute(() =>
        {
            result = _mongoCollection.CountDocuments(x => x.ProjectNumber == projectNumber);
        });
        return result;
    }

Do someone have the correct error display when the 429 exception happends in CosmosDb with the Mongodb driver or can someone show me the way he handled it properly.

Upvotes: 3

Views: 3247

Answers (1)

Adriaan de Beer
Adriaan de Beer

Reputation: 1286

There is actually a few more exceptions you'll have to handle in order to properly handle rate limiting and timeouts - especially when using the newer MongoDB V3.6 endpoint (as opposed to the older V3.2 endpoint).

  • For V3.2 Endpoints: The two exceptions you care about is the MongoCommandException and MongoExecutionTimeoutException. The MongoCommandException includes a BsonDocument property in its Result field. This document has a StatusCode you can use to detect 429. That said, from my testing, I also found that I had to handle Http Service Unavailable (1) and Operation Exceeded Time Limit (50) status codes.
  • For V3.6 Endpoints: You probably also want to handle MongoWriteException and MongoBulkWriteException. These exceptions include a RetryAfterMs= value in the exception message (not always though!). Unfortunately, this value does not seem to be directly exposed via a class property - most likely because this is a CosmosDB specific feature and thus does not map to the MongoDB driver defined exceptions.

The below code is implemented in .NET Standard 2.0 and should give you more than a good starting point. You will definitely want to tweak some of the constants based on your circumstances and testing.

    public static class Policies
    {
        public const int HttpThrottleErrorCode = 429;
        public const int HttpServiceIsUnavailable = 1;
        public const int HttpOperationExceededTimeLimit = 50;
        public const int RateLimitCode = 16500;
        public const string RetryAfterToken = "RetryAfterMs=";
        public const int MaxRetries = 10;
        public static readonly int RetryAfterTokenLength = RetryAfterToken.Length;

        private static readonly Random JitterSeed = new Random();

        public static readonly IAsyncPolicy NoPolicy = Policy.NoOpAsync();

        public static Func<int, TimeSpan> SleepDurationProviderWithJitter(double exponentialBackoffInSeconds, int maxBackoffTimeInSeconds) => retryAttempt
            => TimeSpan.FromSeconds(Math.Min(Math.Pow(exponentialBackoffInSeconds, retryAttempt), maxBackoffTimeInSeconds)) // exponential back-off: 2, 4, 8 etc
               + TimeSpan.FromMilliseconds(JitterSeed.Next(0, 1000)); // plus some jitter: up to 1 second

        public static readonly Func<int, TimeSpan> DefaultSleepDurationProviderWithJitter =
            SleepDurationProviderWithJitter(1.5, 23);


        public static readonly IAsyncPolicy MongoCommandExceptionPolicy = Policy
            .Handle<MongoCommandException>(e =>
            {
                if (e.Code != RateLimitCode || !(e.Result is BsonDocument bsonDocument))
                {
                    return false;
                }

                if (bsonDocument.TryGetValue("StatusCode", out var statusCode) && statusCode.IsInt32)
                {
                    switch (statusCode.AsInt32)
                    {
                        case HttpThrottleErrorCode:
                        case HttpServiceIsUnavailable:
                        case HttpOperationExceededTimeLimit:
                            return true;
                        default:
                            return false;
                    }
                }

                if (bsonDocument.TryGetValue("IsValid", out var isValid) && isValid.IsBoolean)
                {
                    return isValid.AsBoolean;
                }

                return true;
            })
            .WaitAndRetryAsync(
                retryCount: MaxRetries,
                DefaultSleepDurationProviderWithJitter
            );

        public static readonly IAsyncPolicy ExecutionTimeoutPolicy = Policy
            .Handle<MongoExecutionTimeoutException>(e =>
                e.Code == RateLimitCode || e.Code == HttpOperationExceededTimeLimit
            )
            .WaitAndRetryAsync(
                retryCount: MaxRetries,
                DefaultSleepDurationProviderWithJitter
            );

        public static readonly IAsyncPolicy MongoWriteExceptionPolicy = Policy
            .Handle<MongoWriteException>(e =>
            {
                return e.WriteError?.Code == RateLimitCode
                       || (e.InnerException is MongoBulkWriteException bulkException &&
                           bulkException.WriteErrors.Any(error => error.Code == RateLimitCode));
            })
            .WaitAndRetryAsync(
                retryCount: MaxRetries,
                sleepDurationProvider: (retryAttempt, e, ctx) =>
                {
                    var timeToWaitInMs = ExtractTimeToWait(e.Message);
                    if (!timeToWaitInMs.HasValue && e.InnerException != null)
                    {
                        timeToWaitInMs = ExtractTimeToWait(e.InnerException.Message);
                    }
                    return timeToWaitInMs ?? DefaultSleepDurationProviderWithJitter(retryAttempt);
                },
                onRetryAsync: (e, ts, i, ctx) => Task.CompletedTask
            );

        public static readonly IAsyncPolicy MongoBulkWriteExceptionPolicy = Policy
            .Handle<MongoBulkWriteException>(e =>
            {
                return e.WriteErrors.Any(error => error.Code == RateLimitCode);
            })
            .WaitAndRetryAsync(
                retryCount: MaxRetries,
                sleepDurationProvider: (retryAttempt, e, ctx) =>
                {
                    var timeToWaitInMs = ExtractTimeToWait(e.Message);
                    return timeToWaitInMs ?? DefaultSleepDurationProviderWithJitter(retryAttempt);
                },
                onRetryAsync: (e, ts, i, ctx) => Task.CompletedTask
            );

        /// <summary>
        /// It doesn't seem like RetryAfterMs is a property value - so unfortunately, we have to extract it from a string... (crazy??!)
        /// </summary>
        private static TimeSpan? ExtractTimeToWait(string messageToParse)
        {
            var retryPos = messageToParse.IndexOf(RetryAfterToken, StringComparison.OrdinalIgnoreCase);
            if (retryPos >= 0)
            {
                retryPos += RetryAfterTokenLength;
                var endPos = messageToParse.IndexOf(',', retryPos);
                if (endPos > 0)
                {
                    var timeToWaitInMsString = messageToParse.Substring(retryPos, endPos - retryPos);
                    if (Int32.TryParse(timeToWaitInMsString, out int timeToWaitInMs))
                    {
                        return TimeSpan.FromMilliseconds(timeToWaitInMs)
                               + TimeSpan.FromMilliseconds(JitterSeed.Next(100, 1000));
                    }
                }
            }
            return default;
        }

        /// <summary>
        /// Use this policy if your CosmosDB MongoDB endpoint is V3.2
        /// </summary>
        public static readonly IAsyncPolicy DefaultPolicyForMongo3_2 = Policy.WrapAsync(MongoCommandExceptionPolicy, ExecutionTimeoutPolicy);

        /// <summary>
        /// Use this policy if your CosmosDB MongoDB endpoint is V3.6 or V3.2
        /// </summary>
        public static readonly IAsyncPolicy DefaultPolicyForMongo3_6 = Policy.WrapAsync(MongoCommandExceptionPolicy, ExecutionTimeoutPolicy, MongoWriteExceptionPolicy, MongoBulkWriteExceptionPolicy);
    }

    public static IAsyncPolicy DefaultPolicy { get; set; } = Policies.DefaultPolicyForMongo3_6;

Upvotes: 9

Related Questions