Paul
Paul

Reputation: 21975

Lock for ConcurrentDictionary when AddOrUpdate-ing?

I use a ConcurrentDictioanry<string, HashSet<string>> to access some data across many threads.

I read in this article (scroll down) that the method AddOrUpdate is not executed in the lock, so it could endanger thread-safety.

My code is as follows:

//keys and bar are not the concern here
ConcurrentDictioanry<string, HashSet<string>> foo = new ...;
foreach(var key in keys) {
    foo.AddOrUpdate(key, new HashSet<string> { bar }, (key, val) => {
        val.Add(bar);
        return val;
    });
}

Should I enclose the AddOrUpdate call in a lock statement in order to be sure everything is thread-safe?

Upvotes: 7

Views: 6567

Answers (5)

Theodor Zoulias
Theodor Zoulias

Reputation: 43515

Should I enclose the AddOrUpdate call in a lock statement in order to be sure everything is thread-safe?

No. It's not that simple. The AddOrUpdate is an advanced API, that should be used only when the TValue is immutable. You should have a clear understanding of how this method works before using it. This method invokes repeatedly the updateValueFactory delegate, competing with other threads, until it wins the race to update the dictionary. The updateValueFactory is invoked without any synchronization, so multiple threads can invoke it concurrently for the same key. The concurrency pattern is optimistic: each thread assumes optimistically that it will win the race, so it invokes the delegate and then tries to store the produced value in the dictionary. In case it is preempted¹ by another thread, it repeats the same pattern. It invokes the updateValueFactory again, passing as existing argument the value that was stored successfully by the other thread. Apparently Microsoft decided that using a pessimistic approach, i.e. invoking the delegate while holding an exclusive lock on the key, would be comparatively more dangerous because it could result in a deadlock.

You have basically three options:

  1. Keep using the HashSet<T>, but protect meticulously the HashSet<T> instances from concurrent accesses by using the lock statement.
  2. Replace the HashSet<T> with a mutable thread-safe type like the ConcurrentDictionary<T,bool> (nesting one dictionary inside the other).
  3. Replace the HashSet<T> with an immutable thread-safe type like the ImmutableHashSet<T> (example here).

Since you can't use the AddOrUpdate method with a HashSet<T>, because it's a mutable type, you should use instead the GetOrAdd method. Example:

HashSet<string> val = foo.GetOrAdd(key, static _ => new HashSet<string>());
lock (val) val.Add(bar);

¹ The test for preemption is performed currently (.NET 9) by comparing the captured value of the key with the current value. If the values are equal, it is assumed that the current thread was not preempted, and so that the current thread won the race and the produced value can be stored in the dictionary. This approach relies on invoking the TValue.Equals, which can be prohibitively expensive. This potential performance trap in not documented. Microsoft is aware of this, and is OK with it because no one has complained (yet). This strategy has the effect of masking incorrect uses of the AddOrUpdate method, specifically using a mutable TValue, mutating the existing value inside the updateValueFactory delegate, and then returning the existing value. Alternative testing strategies are possible, and if ever used they would expose the incorrect AddOrUpdate usages by causing programs to malfunction (occasionally the updateValueFactory would be invoked more than once per AddorUpdate operation). The probability of a strategy change in the future is slim, but nevertheless I would consider relying on the existing behavior, when the correctness of the program is at stake, as unwise.

Upvotes: 1

eaglei22
eaglei22

Reputation: 2831

Seems the better answer would be to use Lazy, per this article on the methods that pass in a delegate.

Also another good article Here on Lazy loading the add delegate.

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1500525

Locking during AddOrUpdate on its own wouldn't help - you'd still have to lock every time you read from the set.

If you're going to treat this collection as thread-safe, you really need the values to be thread-safe too. You need a ConcurrentSet, ideally. Now that doesn't exist within the framework (unless I've missed something) but you could probably create your own ConcurrentSet<T> which used a ConcurrentDictionary<T, int> (or whatever TValue you like) as its underlying data structure. Basically you'd ignore the value within the dictionary, and just treat the presence of the key as the important part.

You don't need to implement everything within ISet<T> - just the bits you actually need.

You'd then create a ConcurrentDictionary<string, ConcurrentSet<string>> in your application code, and you're away - no need for locking.

Upvotes: 8

Hans Passant
Hans Passant

Reputation: 941465

You'll need to fix this code, it creates a lot of garbage. You create a new HashSet even if none is required. Use the other overload, the one that accepts the valueFactory delegate. So the HashSet is only created when the key isn't yet present in the dictionary.

The valueFactory might be called multiple times if multiple threads concurrently try to add the same value of key and it is not present. Very low odds but not zero. Only one of these hashsets will be used. Not a problem, creating the HashSet has no side effects that could cause threading trouble, the extra copies just get garbage collected.

Upvotes: 4

fsimonazzi
fsimonazzi

Reputation: 3013

The article states that the add delegate is not executed in the dictionary's lock, and that the element you get might not be the element created in that thread by the add delegate. That's not a thread safety issue; the dictionary's state will be consistent and all callers will get the same instance, even if a different instance was created for each of them (and all but one get dropped).

Upvotes: 0

Related Questions