Reputation: 41
I wrote a simple thread adding values to a dictionary from many other parts of the project:
public void AddValue(int kid, int vid)
{
if(!dic.ContainsKey(kid)) dic.Add(kid, new List<int>());
dic[kid].Add(vid);
}
When I ran the code, sometimes it'll show that certain key ID does not exist in the dictionary, I figured that's because different threads are "fighting for it".
But then again, in theory, shouldn't multiple threads fighting for if(!dic.ContainsKey(kid)) dic.Add(kid, new List<int>());
instead since when different threads enter the method without initiating the dictionary should all satisfy the if condition and attempt to Add
the key, thus the error should be "The key is already present in the dictionary" instead?
How could one pass the "if" check and still not initiate the key yet?
PS. I'm aware of the AutoResetEvent
and probably could make it works without any error, I just don't understand how and why the "if" statement could be bypassed.
Upvotes: 0
Views: 1466
Reputation: 645
Dictionary is not thread-safe to solve this you can use ConcurrentDictionary
or lock
so that only one thread can perform the insertion operations.
Without lock
or ConcurrentDictionary
it'll create a race condition between multiple threads which will be difficult to debug.
Example: You can use a lock object based on your requirements, here I'm just using dic
as a lock object.
public void AddValue(int kid, int vid)
{
lock (dic)
{
if (!dic.ContainsKey(kid)) dic.Add(kid, new List<int>());
dic[kid].Add(vid);
}
}
Upvotes: 0
Reputation: 43400
The Dictionary<TKey,TValue>
class is not thread-safe. This means that it is expected to be used by one thread at a time. Multiple threads are allowed to interact with the class sequentially, but not concurrently. Otherwise, the behavior of the class is undefined. This means that there is no defined behavior for this class. Microsoft does not provide documentation about the various ways that the class might misbehave, nor provides help through the forums, nor acknowledges any undesirable behavior as a bug. If you use Microsoft's products incorrectly, you are on your own.
You could try and study how the class behaves when it is misused, but it is unlikely to be a good investment of your time. The behavior is dependent on the internal details of the implementation, and these details might change in the next version of the .NET runtime. So unless you have installed defective software on a machine that can't be updated, and it's critical to obtain insights about how the software might misbehave in the next mission of the Mars rover or something, it's far more productive to learn how to write correct multithreaded code, and fix the existing bugs of your application. Here is an online learning resource: Threading in C# - Thread Safety by Joseph Albahari.
The vast majority of the mutable .NET classes are like the Dictionary<TKey,TValue>
, they are not thread-safe. Most thread-safe classes exist in the System.Threading
and System.Collections.Concurrent
namespaces. Classes that are immutable, like the collections in the System.Collections.Immutable
namespace, are also thread-safe. A thread-safe class can be used by multiple threads concurrently without synchronization, and the integrity of its internal state is maintained. Using thread-safe classes usually helps at writing correct multithreaded code, but it's not a panacea. It is easy to write defective code, infested by race conditions, if you use thread-safe classes incorrectly.
Upvotes: 1
Reputation: 35696
Here is an example that recreates your problem,
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static readonly IDictionary<int, IList<int>> dic =
new Dictionary<int, IList<int>>();
public static void Main()
{
Enumerable
.Range(1, 1000000)
.AsParallel()
.AsUnordered()
.ForAll(i => AddValue(i % 100, i));
Console.WriteLine($"Total Count: {dic.Sum(p => p.Value.Count)}");
Console.WriteLine(
$"Total Distinct Count: {dic.SelectMany(p => p.Value).Distinct().Count()}");
}
public static void AddValue(int kid, int vid)
{
if(!dic.ContainsKey(kid))
dic.Add(kid, new List<int>());
dic[kid].Add(vid);
}
}
This throws exceptions with messages like,
"The given key 'n' was not present in the dictionary"
This is happening because the implementation of Dictionary<T>
is not thread-safe. When Add
is called but, before it returns, there is a short window when the result of ContainsKey
differs from the result of the index accessor [key]
.
This is okay in non-concurrent, single-threaded code, its probably a more optimal implementation. The caller won't use the index accessor until after Add
has returned.
However, in concurrent code, running on multiple threads, imagine thread A calls Add
, and now we are in the short window where the results of ContainsKey
and the index accessor [key]
differ, then thread B calls ContainsKey
and tries to access the value, the Exception is thrown.
Other times you'll get,
"Object reference not set to an instance of an object."
or
"An item with the same key has already been added. Key: n"
on occasion, the code will run without exception but in the output you'll get two different numbers, neither of which are equal to the expected 1000000
.
All these problems are because both Dictionary<TKey, TValue>
and List<T>
are not thread safe.
If we change the code to,
using System;
using System.Collections.Concurrent;
using System.Linq;
public class Program
{
public static readonly ConcurrentDictionary<int, ConcurrentQueue<int>> dic =
new ConcurrentDictionary<int, ConcurrentQueue<int>>();
public static void Main()
{
Enumerable
.Range(1, 1000000)
.AsParallel()
.AsUnordered()
.ForAll(i => AddValue(i % 100, i));
Console.WriteLine($"Total Count: {dic.Sum(p => p.Value.Count)}");
Console.WriteLine(
$"Total Distinct Count: {dic.SelectMany(p => p.Value).Distinct().Count()}");
}
public static void AddValue(int kid, int vid)
{
dic.GetOrAdd(kid, _ => new ConcurrentQueue<int>()).Enqueue(vid);
}
}
Everything works as anticipated.
Of course, for this trivial example, you'd be better off keeping it simple.
using System;
using System.Linq;
public class Program
{
public static void Main()
{
var dic = Enumerable
.Range(1, 1000000)
.GroupBy(i => i % 100)
.ToDictionary(g => g.Key, g => g.ToList());
Console.WriteLine($"Total Count: {dic.Sum(p => p.Value.Count)}");
Console.WriteLine(
$"Total Distinct Count: {dic.SelectMany(p => p.Value).Distinct().Count()}");
}
}
Upvotes: 4
Reputation: 27115
Not sure I understand what actually did happen when you ran your program, but here's a thing that could happen:
thread A thread B
Enters AddValue(5,17) -
calls dic.ContainsKey(5) Enters AddValue(5,12)
gets result, false calls dic.ContainsKey(5)
calls new List<int>() gets result, false
adds empty list A as dic[5] calls new List<int>()
adds 17 to previously empty list A adds empty list B as dic[5],
- overwriting list A
- adds 12 to previously empty list B
End result, after both calls have finished, the list, dic[5]
, contains only one number, 12
.
Here's a worse thing that could happen:
thread A thread B
Enters AddValue(2,42) -
calls dic.ContainsKey(2) -
gets result, false -
calls new List<int>() -
calls dic.add(5,...new list...) Enters AddValue(7,99)
in-process of mutating dic calls dic.ContainsKey(7)
still in process encounters corrupt data structure
returns returns wrong result or maybe
returns crashes the program
- ...
End results; thread B could wipe out the existing list for kid=7 because dic.ContainsKey(7)
gave the wrong answer, or thread B may have attempted to call dic[7].add(99)
when dic[7]
is null (same reason), or program could have crashed within dic.ContainsKey(7)
.
Upvotes: 0
Reputation: 1636
I believe you use Dictionary<>
. It is not thread safe. A Dictionary
can support multiple readers concurrently, as long as the collection is not modified (not your case), so you'd to use ConcurrentDictionary
instead https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2?view=net-7.0.
Upvotes: 0