Piggy Chu
Piggy Chu

Reputation: 41

How to understand this threading issue with a dictionary

I wrote a simple thread adding values to a dictionary from many other parts of the project:

public void AddValue(int kid, int vid)
{
    if(!dic.ContainsKey(kid)) dic.Add(kid, new List<int>());
    dic[kid].Add(vid);
}

When I ran the code, sometimes it'll show that certain key ID does not exist in the dictionary, I figured that's because different threads are "fighting for it".

But then again, in theory, shouldn't multiple threads fighting for if(!dic.ContainsKey(kid)) dic.Add(kid, new List<int>()); instead since when different threads enter the method without initiating the dictionary should all satisfy the if condition and attempt to Add the key, thus the error should be "The key is already present in the dictionary" instead?

How could one pass the "if" check and still not initiate the key yet?

PS. I'm aware of the AutoResetEvent and probably could make it works without any error, I just don't understand how and why the "if" statement could be bypassed.

Upvotes: 0

Views: 1466

Answers (5)

ShubhamWagh
ShubhamWagh

Reputation: 645

Dictionary is not thread-safe to solve this you can use ConcurrentDictionary or lock so that only one thread can perform the insertion operations.

Without lock or ConcurrentDictionary it'll create a race condition between multiple threads which will be difficult to debug.

Example: You can use a lock object based on your requirements, here I'm just using dic as a lock object.

 public void AddValue(int kid, int vid)
 {
        lock (dic)
        {
            if (!dic.ContainsKey(kid)) dic.Add(kid, new List<int>());
            dic[kid].Add(vid);
        }
 }
    

Upvotes: 0

Theodor Zoulias
Theodor Zoulias

Reputation: 43400

The Dictionary<TKey,TValue> class is not thread-safe. This means that it is expected to be used by one thread at a time. Multiple threads are allowed to interact with the class sequentially, but not concurrently. Otherwise, the behavior of the class is undefined. This means that there is no defined behavior for this class. Microsoft does not provide documentation about the various ways that the class might misbehave, nor provides help through the forums, nor acknowledges any undesirable behavior as a bug. If you use Microsoft's products incorrectly, you are on your own.

You could try and study how the class behaves when it is misused, but it is unlikely to be a good investment of your time. The behavior is dependent on the internal details of the implementation, and these details might change in the next version of the .NET runtime. So unless you have installed defective software on a machine that can't be updated, and it's critical to obtain insights about how the software might misbehave in the next mission of the Mars rover or something, it's far more productive to learn how to write correct multithreaded code, and fix the existing bugs of your application. Here is an online learning resource: Threading in C# - Thread Safety by Joseph Albahari.

The vast majority of the mutable .NET classes are like the Dictionary<TKey,TValue>, they are not thread-safe. Most thread-safe classes exist in the System.Threading and System.Collections.Concurrent namespaces. Classes that are immutable, like the collections in the System.Collections.Immutable namespace, are also thread-safe. A thread-safe class can be used by multiple threads concurrently without synchronization, and the integrity of its internal state is maintained. Using thread-safe classes usually helps at writing correct multithreaded code, but it's not a panacea. It is easy to write defective code, infested by race conditions, if you use thread-safe classes incorrectly.

Upvotes: 1

Jodrell
Jodrell

Reputation: 35696

Here is an example that recreates your problem,

using System;
using System.Collections.Generic;
using System.Linq;
                    
public class Program
{
    public static readonly IDictionary<int, IList<int>> dic = 
        new Dictionary<int, IList<int>>();
    
    public static void Main()
    {
        Enumerable
            .Range(1, 1000000)
            .AsParallel()
            .AsUnordered()
            .ForAll(i => AddValue(i % 100, i));
        
        Console.WriteLine($"Total Count: {dic.Sum(p => p.Value.Count)}");
        Console.WriteLine(
            $"Total Distinct Count: {dic.SelectMany(p => p.Value).Distinct().Count()}");
    }
    
    public static void AddValue(int kid, int vid)
    {
        if(!dic.ContainsKey(kid))
            dic.Add(kid, new List<int>());
        dic[kid].Add(vid);
    }
}

This throws exceptions with messages like,

"The given key 'n' was not present in the dictionary"

This is happening because the implementation of Dictionary<T> is not thread-safe. When Add is called but, before it returns, there is a short window when the result of ContainsKey differs from the result of the index accessor [key].

This is okay in non-concurrent, single-threaded code, its probably a more optimal implementation. The caller won't use the index accessor until after Add has returned.

However, in concurrent code, running on multiple threads, imagine thread A calls Add, and now we are in the short window where the results of ContainsKey and the index accessor [key] differ, then thread B calls ContainsKey and tries to access the value, the Exception is thrown.

Other times you'll get,

"Object reference not set to an instance of an object."

or

"An item with the same key has already been added. Key: n"

on occasion, the code will run without exception but in the output you'll get two different numbers, neither of which are equal to the expected 1000000.

All these problems are because both Dictionary<TKey, TValue> and List<T> are not thread safe.


If we change the code to,

using System;
using System.Collections.Concurrent;
using System.Linq;
                    
public class Program
{
    public static readonly ConcurrentDictionary<int, ConcurrentQueue<int>> dic =
        new ConcurrentDictionary<int, ConcurrentQueue<int>>();
    
    public static void Main()
    {
        Enumerable
            .Range(1, 1000000)
            .AsParallel()
            .AsUnordered()
            .ForAll(i => AddValue(i % 100, i));
        
        Console.WriteLine($"Total Count: {dic.Sum(p => p.Value.Count)}");
        Console.WriteLine(
            $"Total Distinct Count: {dic.SelectMany(p => p.Value).Distinct().Count()}");
    }
    
    public static void AddValue(int kid, int vid)
    {
        dic.GetOrAdd(kid, _ => new ConcurrentQueue<int>()).Enqueue(vid);
    }
}

Everything works as anticipated.

Of course, for this trivial example, you'd be better off keeping it simple.

using System;
using System.Linq;
                    
public class Program
{   
    public static void Main()
    {
        var dic = Enumerable
            .Range(1, 1000000)
            .GroupBy(i => i % 100)
            .ToDictionary(g => g.Key, g => g.ToList());
        
        Console.WriteLine($"Total Count: {dic.Sum(p => p.Value.Count)}");
        Console.WriteLine(
            $"Total Distinct Count: {dic.SelectMany(p => p.Value).Distinct().Count()}");
    }
}

Upvotes: 4

Solomon Slow
Solomon Slow

Reputation: 27115

Not sure I understand what actually did happen when you ran your program, but here's a thing that could happen:

thread A                               thread B
Enters AddValue(5,17)                     -
calls dic.ContainsKey(5)               Enters AddValue(5,12)
gets result, false                     calls dic.ContainsKey(5)
calls new List<int>()                  gets result, false
adds empty list A as dic[5]            calls new List<int>()
adds 17 to previously empty list A     adds empty list B as dic[5],
   -                                       overwriting list A
   -                                   adds 12 to previously empty list B 

End result, after both calls have finished, the list, dic[5], contains only one number, 12.


Here's a worse thing that could happen:

thread A                               thread B
Enters AddValue(2,42)                     -
calls dic.ContainsKey(2)                  -
gets result, false                        -
calls new List<int>()                     -
calls dic.add(5,...new list...)        Enters AddValue(7,99)
    in-process of mutating dic         calls dic.ContainsKey(7)
    still in process                       encounters corrupt data structure
    returns                                returns wrong result or maybe
returns                                    crashes the program
   -                                   ...

End results; thread B could wipe out the existing list for kid=7 because dic.ContainsKey(7) gave the wrong answer, or thread B may have attempted to call dic[7].add(99) when dic[7] is null (same reason), or program could have crashed within dic.ContainsKey(7).

Upvotes: 0

Roman Ryzhiy
Roman Ryzhiy

Reputation: 1636

I believe you use Dictionary<>. It is not thread safe. A Dictionary can support multiple readers concurrently, as long as the collection is not modified (not your case), so you'd to use ConcurrentDictionary instead https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2?view=net-7.0.

Upvotes: 0

Related Questions