Reputation: 169

Why does it take so long to get through first level retries?

I've just started playing around with NServiceBus on Azure, and for some reason it takes a long time to get through the first level retries when a message handler throws an exception. With retries set to 5 it takes 20+ minutes before the second level retries kick in.

What is causing the delay?

Here's how I'm configuring the bus:

Configure.Transactions.Advanced(s =>
{
    s.DisableDistributedTransactions();
    s.DoNotWrapHandlersExecutionInATransactionScope();
});

Configure.With()
    .AutofacBuilder(container)
    .DefiningCommandsAs(t => t.IsCommand())
    .DefiningEventsAs(t => t.IsEvent())
    .XmlSerializer()
    .MessageForwardingInCaseOfFault()
    .AzureConfigurationSource()
    .UseTransport<AzureStorageQueue>()
    .AzureDiagnosticsLogger()                     
    .AzureMessageQueue()                     
    .AzureSubcriptionStorage()                     
    .UseAzureTimeoutPersister() 
    .UnicastBus()                     
    .RunHandlersUnderIncomingPrincipal(false);

FYI: I'm using NServiceBus built from the develop branch as of today and running in the emulator.

Upvotes: 0

Answers (3)

Yves Goeleven

Reputation: 2185

Oh, I misread the question, I thought it was taking 20 minutes after last retry for the second level to kick in. But than I know what this is and it's configurable!

To support batching (to lower the cost) the message visible time is calculated by multiplying the individual MessageInvisibleTime by the amount in the BatchSize, the default MessageInvisibleTime is 30000 (milliseconds), the default BatchSize is 10. Multiply that again with 5 first level retries and you'll end up with 25 minutes before the first exception occurs and the second level to kick in.

You can reconfigure this if you like: MessageInvisibleTime and BatchSize is a property on the AzureQueueConfig and MaxRetries sits on TransportConfig (in 4.0) or MsmqTransportConfig (in 3.X)

Upvotes: 2

user2292703

Reputation: 1

I was under the impression that first level retries did not need a timeoutpersister (was not even aware that of its existence to be honest) and that first level retries were only driven by the peek lock/invisible time of messages in the Azure queue.

For second level retries I would expect the timeoutpersister to play a role (now that I know it exists...).

Yves, correct me if I am wrong.

Upvotes: 0

Yves Goeleven

Reputation: 2185

Can you open an issue on github for this, with repro if possible? on http://www.github.com/nservicebus/nservicebus

I suspect the delay comes from the azure timeout persister as that is the one responsible for managing the time between retries, yet 20 minutes seems like a really odd number so have no immediate explanation for the observed behavior.

In the mean time, can you try using the in memory timeoutpersister and see if the issue disappears, that would confirm my hypethesis.

Upvotes: 0

Why does it take so long to get through first level retries?

Answers (3)

Related Questions