Reputation: 21975
I use a ConcurrentDictioanry<string, HashSet<string>>
to access some data across many threads.
I read in this article (scroll down) that the method AddOrUpdate
is not executed in the lock, so it could endanger thread-safety.
My code is as follows:
//keys and bar are not the concern here
ConcurrentDictioanry<string, HashSet<string>> foo = new ...;
foreach(var key in keys) {
foo.AddOrUpdate(key, new HashSet<string> { bar }, (key, val) => {
val.Add(bar);
return val;
});
}
Should I enclose the AddOrUpdate
call in a lock
statement in order to be sure everything is thread-safe?
Upvotes: 7
Views: 6567
Reputation: 43515
Should I enclose the
AddOrUpdate
call in alock
statement in order to be sure everything is thread-safe?
No. It's not that simple. The AddOrUpdate
is an advanced API, that should be used only when the TValue
is immutable. You should have a clear understanding of how this method works before using it. This method invokes repeatedly the updateValueFactory
delegate, competing with other threads, until it wins the race to update the dictionary. The updateValueFactory
is invoked without any synchronization, so multiple threads can invoke it concurrently for the same key. The concurrency pattern is optimistic: each thread assumes optimistically that it will win the race, so it invokes the delegate and then tries to store the produced value in the dictionary. In case it is preempted¹ by another thread, it repeats the same pattern. It invokes the updateValueFactory
again, passing as existing
argument the value that was stored successfully by the other thread. Apparently Microsoft decided that using a pessimistic approach, i.e. invoking the delegate while holding an exclusive lock on the key, would be comparatively more dangerous because it could result in a deadlock.
You have basically three options:
HashSet<T>
, but protect meticulously the HashSet<T>
instances from concurrent accesses by using the lock
statement.HashSet<T>
with a mutable thread-safe type like the ConcurrentDictionary<T,bool>
(nesting one dictionary inside the other).HashSet<T>
with an immutable thread-safe type like the ImmutableHashSet<T>
(example here).Since you can't use the AddOrUpdate
method with a HashSet<T>
, because it's a mutable type, you should use instead the GetOrAdd
method. Example:
HashSet<string> val = foo.GetOrAdd(key, static _ => new HashSet<string>());
lock (val) val.Add(bar);
¹ The test for preemption is performed currently (.NET 9) by comparing the captured value of the key with the current value. If the values are equal, it is assumed that the current thread was not preempted, and so that the current thread won the race and the produced value can be stored in the dictionary. This approach relies on invoking the TValue.Equals
, which can be prohibitively expensive. This potential performance trap in not documented. Microsoft is aware of this, and is OK with it because no one has complained (yet). This strategy has the effect of masking incorrect uses of the AddOrUpdate
method, specifically using a mutable TValue
, mutating the existing value inside the updateValueFactory
delegate, and then returning the existing value. Alternative testing strategies are possible, and if ever used they would expose the incorrect AddOrUpdate
usages by causing programs to malfunction (occasionally the updateValueFactory
would be invoked more than once per AddorUpdate
operation). The probability of a strategy change in the future is slim, but nevertheless I would consider relying on the existing behavior, when the correctness of the program is at stake, as unwise.
Upvotes: 1
Reputation: 2831
Seems the better answer would be to use Lazy, per this article on the methods that pass in a delegate.
Also another good article Here on Lazy loading the add delegate.
Upvotes: 0
Reputation: 1500525
Locking during AddOrUpdate
on its own wouldn't help - you'd still have to lock every time you read from the set.
If you're going to treat this collection as thread-safe, you really need the values to be thread-safe too. You need a ConcurrentSet
, ideally. Now that doesn't exist within the framework (unless I've missed something) but you could probably create your own ConcurrentSet<T>
which used a ConcurrentDictionary<T, int>
(or whatever TValue
you like) as its underlying data structure. Basically you'd ignore the value within the dictionary, and just treat the presence of the key as the important part.
You don't need to implement everything within ISet<T>
- just the bits you actually need.
You'd then create a ConcurrentDictionary<string, ConcurrentSet<string>>
in your application code, and you're away - no need for locking.
Upvotes: 8
Reputation: 941465
You'll need to fix this code, it creates a lot of garbage. You create a new HashSet even if none is required. Use the other overload, the one that accepts the valueFactory delegate. So the HashSet is only created when the key isn't yet present in the dictionary.
The valueFactory might be called multiple times if multiple threads concurrently try to add the same value of key and it is not present. Very low odds but not zero. Only one of these hashsets will be used. Not a problem, creating the HashSet has no side effects that could cause threading trouble, the extra copies just get garbage collected.
Upvotes: 4
Reputation: 3013
The article states that the add delegate is not executed in the dictionary's lock, and that the element you get might not be the element created in that thread by the add delegate. That's not a thread safety issue; the dictionary's state will be consistent and all callers will get the same instance, even if a different instance was created for each of them (and all but one get dropped).
Upvotes: 0