Reputation: 1451
Let's say I have a method that iterates over all rows in a ReliableDictionary
like so:
var reliableDictionary = await StateManager.GetOrAddAsync<IReliableDictionary<TKey, TValue>>(dictionaryName);
using (var tx = StateManager.CreateTransaction())
{
var enumerable = await reliableDictionary.CreateEnumerableAsync(tx);
var enumerator = enumerable.GetAsyncEnumerator();
while (await enumerator.MoveNextAsync(cancellationToken))
{
// Read enumerator.Current and do something with the value
// (not writing back to the dictionary here)
}
}
How could I handle retrying of transient exceptions here (i.e., TimeoutException
, FabricNotReadableException
and FabricTransientException
)?
The code documentation for the enumerator is unclear on what exceptions can be thrown on each method. Which methods can throw these transient exceptions - CreateTransaction
, CreateEnumerableAsync
, GetAsyncEnumerator
, MoveNextAsync
and enumerator.Current
?
If a transient exception is thrown from one of these methods, how should I retry?
If a transient exception is thrown from MoveNextAsync
or enumerator.Current
, can I retry it without aborting the while loop, or should I create a whole new transaction and start enumerating from the beginning again?
Upvotes: 2
Views: 1430
Reputation: 3315
This article https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-work-with-reliable-collections describes working with Reliable Collections under transactions. Basically you should do the following:
retry:
try {
// Create a new Transaction object for this partition
using (ITransaction tx = base.StateManager.CreateTransaction()) {
// AddAsync takes key's write lock; if >4 secs, TimeoutException
await m_dic.AddAsync(tx, key, value, cancellationToken);
await tx.CommitAsync();
}
}
catch (TimeoutException) {
await Task.Delay(100, cancellationToken); goto retry;
}
The sample usage here is with goto statement, but any retry handling should work.
You can modify the timeout if you know your transaction will take longer (as it will in your case) but you should consider the impact it might have on your solution. https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-reliable-collections
The default time-out is 4 seconds for all the Reliable Collection APIs. Most users should not override this.
And
Do not use
TimeSpan.MaxValue
for time-outs. Time-outs should be used to detect deadlocks.
As for the other exception types you mention (FabricNotReadableException
and FabricTransientException
), you could/should retry those as well. They are commonly thrown by Service Fabric when something changes in the configuration of your service(s), like a change in primary or if you for some reason end up talking to a secondary. Most cases it should be retryable. FabricTransientException
is just a base class for a number of exceptions that can occur in the communication with Reliable Services and it indicates an exception that could go away if retried.
This answer describes FabricNotReadableException
, for instance, there are some cases where you need to re-resolve your service in the client to end up on another replica.
Upvotes: 2