Reputation: 227
We have a .NET Core API running in production which can run stable for days or even weeks and then suddenly freezes. Such a freeze can even happen multiple times a day, completely random. What happens: the code seems to be frozen and doesn't accept any new requests. No new requests are logged, the thread count rises sky-high and the memory rises steadily until it's maxed out.
I created a memory dump to analyze. It tells me that most threads are waiting for a lock to be released at a specific function, looking like a deadlock. I analysed this function and cannot see why this would cause issues. Can someone help me out? Obviously I suspect AsParallel() to be thread unsafe, but the internet says no, it is thread safe.
public async Task<bool> TryStorePropertiesAsync(string sessionId, Dictionary<string, string> keyValuePairs, int ttl = 1500)
{
try
{
await Task.WhenAll(keyValuePairs.AsParallel().Select(async item =>
{
var doc = await _cosmosDbRepository.GetItemByKeyAsync(GetId(sessionId, item.Key), sessionId) ?? new Document();
doc.SetPropertyValue("_partitionKey", sessionId);
doc.SetPropertyValue("key", GetId(sessionId, item.Key));
doc.SetPropertyValue("name", item.Key.ToLowerInvariant());
doc.SetPropertyValue("value", item.Value);
doc.TimeToLive = ttl;
await _cosmosDbRepository.UpsertDocumentAsync(doc, "_partitionKey");
}));
return true;
}
catch
{
ApplicationInsightsLogger.TrackException(ex, new Dictionary<string, string>
{
{ "sessionID", sessionId },
{ "action", "TryStoreItems" }
});
return false;
}
}
Upvotes: 1
Views: 1350
Reputation: 131676
The code has serious issues. For eg 100 items, it fires off 100 concurrent operations, 4/8 at a time. The code inside the loop seems to read a document from CosmosDB, set all its properties then call a method named similar to DocumentClient.UpsertDocumentAsync which doesn't need pre-loading anything. Without knowing what _cosmosDbRepository
is and what its methods do, one can only guess. It's possible it creates extra conflicts though by trying to lock stuff while the (probably useless) load/update cycle takes place.
For starters, AsParallel()
is only meant for data parallelism: partition some data in memory and use as many workers are there are cores to crunch each partition. There's no data here though, just calls to async operations. That's why for 100 items, this code will fire off 100 concurrent tasks.
That could hit any number of CosmosDB throttling limits, even if it doesn't cause concurrency conflicts. It could also lead to networking issues, as the same cable is used for all those concurrent connections.
Not taking CosmosDB into account, the correct way to make lots of calls to a remote service is to queue them and execute them with a limited number of workers. This is very easy to do with .NET's ActionBlock. The code could change to something like this :
class Payload
{
public string SessionKey{get;set;}
public string Key{get;set;}
public string Name{get;set;}
public string Value{get;set;}
public int TTL{get;set;}
}
//Allow only 10 concurrent upserts
var options=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
var upsertBlock=new ActionBlock<Payload>(myPosterAsync,options);
foreach(var payload in payloads)
{
block.Post(pair);
}
//Tell the block we're done
block.Complete();
//Await for all queued operations to complete
await block.Completion;
Where myPosterAsync
contains the posting code :
async Task myPosterAsync(Payload item)
{
try
{
var doc = await _cosmosDbRepository.GetItemByKeyAsync(GetId(item.SessionId, item.Key),
item.SessionId)
?? new Document();
doc.SetPropertyValue("_partitionKey", item.SessionId);
doc.SetPropertyValue("key", GetId(sessionId, item.Key));
doc.SetPropertyValue("name", item.Name);
doc.SetPropertyValue("value", item.Value);
doc.TimeToLive = item.TTL;
await _cosmosDbRepository.UpsertDocumentAsync(doc, "_partitionKey");
catch (Exception ex)
{
//Handle the error in some way, eg log it
ApplicationInsightsLogger.TrackException(ex, new Dictionary<string, string>
{
{ "sessionID", item.SessionId },
{ "action", "TryStoreItems" }
});
}
}
Upvotes: 1