jsk
jsk

Reputation: 85

Can I use a retry policy in an azure function?

I'm using event hubs to temporary store data which will first be saved to azure table storage and then indexed to elasticsearch. I was thinking that I should do the storage saving calls in an azure function, and do the same for the elasticsearch indexing using NEST. It is important that the data is processed, so I was thinking that I'll use Polly as a retry policy in case the elasticsearch server is failing. However, won't a retry policy potentially make the azure function expensive? Is azure functions even the right way to go?

Upvotes: 0

Views: 3141

Answers (2)

Mikhail Shilkov
Mikhail Shilkov

Reputation: 35124

Yes, you can use Polly for retries inside your Azure Functions. Some further considerations:

  • Yes, you will pay for the retry time. But given that your Elastic Search is "mostly up", the extra price for occasional retries should not be too high.

  • If you want to retry saving to Table Storage too, you will have to write calls decorated with Polly yourself instead of otherwise preferred output binding

  • Make sure to check if order of writes is important to you and whether you should retry Table Storage writes to completion before you start writing to Elastic, or vice versa. Otherwise you can do them in parallel with async and then Task.WaitAll

  • The maximum execution time of a Function is 5 minutes by default, you can configure it up to 10 minutes max. If you need to handle outages longer than that, you probably need a plan B. E.g. start copying the events that are failing for longer than 4 (or 9) minutes to a dedicated Queue, and retry from there. Or disabling the Function for such periods of downtime.

Upvotes: 2

evilSnobu
evilSnobu

Reputation: 26314

Yes it is. You could use a library or better just write a simple linear backoff strategy — like try 5 times with 5 seconds sleep in between — and do something like

context.log.error({
    message: `Transient failure. This is Retry number ${retryCount}.`,
    errorCode: errorCodeFromCallingElasticSearch,
    errorDetails: moreContextMaybeSomeStack
});

every time you hit the retry logic so it goes to App Insights (make sure you integrate with App Insights, else you have no ops or it's completely dark ops).

You can then query for how often is it really a miss and get an idea on how well things go at the 95% percentile.

Occasionally running 10 seconds over the normal 1 second execution time for your function is going to cost extra, but probably nowhere near a full dedicated App Service Plan. If it comes close, just switch to that, it means your function is mostly on rather than off - which is still a perfectly good case for running a function.

App Insights can also trigger alerts if some metric goes haywire, like your retry count goes up to 11 for 24 hours, you probably want to know about that deviation. You'll need to send the retry count as a custom metric to trigger an alert off of it:

context.log.metric("CallElasticSearchRetryCount", retryCount);

Upvotes: 0

Related Questions