Reputation: 1690
I'm looking to start implementing some concurrent functions in a project that I'm working on. I recently discovered the System.Collections.Concurrent
namespace that I am planning to leverage.
The object that I use to track the overall state of the actions is essentially a dictionary with some nested custom objects. My thought is that as long as the highest level collection is configured to be concurrent/thread safe, it doesn't matter if the nested collections are, since the data will be locked by the higher level collection.
Is this the correct assumption?
For example, would something in PowerShell like the following be safe to use?
[System.Collections.Concurrent.ConcurrentDictionary[[String], [MyCustomClass]]]::new()
Additionally, I have some custom class that extend HashSet to avoid duplicates. Since System.Collections. Concurrent don't have a HashSet class, what is the recommended way to get similar functionality but concurrently?
Upvotes: 3
Views: 1336
Reputation: 174690
My thought is that as long as the highest level collection is configured to be concurrent/thread safe, it doesn't matter if the nested collections are, since the data will be locked by the higher level collection.
Is this the correct assumption?
No, that's not a safe assumption.
Let's say you create a concurrent dictionary containing a bunch of regular hashtables:
using namespace System.Collections.Concurrent
# Create thread-safe dictionary
$rootDict = [ConcurrentDictionary[string,hashtable]]::new()
$rootDict
is now thread-safe - more than one thread cannot concurrently modify the 'A'
entry by overwriting the reference to the hashtable$rootDict
is NOT thread-safe - it'll still just be a regular hashtableIn PowerShell 7 this can be observed when using ForEach-Object -Parallel
to operate on such a data structure:
using namespace System.Collections.Concurrent
# Create thread-safe dictionary
$rootDict = [ConcurrentDictionary[string,hashtable]]::new()
1..100 |ForEach-Object -Parallel {
# We need a reference to our safe top-level dictionary
$dict = $using:rootDict
# ... and we need a key
$rootKey = $_ % 2 -eq 0 ? 'even' : 'odd'
# Thread-safe acquisition of inner hashtable
$innerDict = $dict.GetOrAdd($rootKey, {param($key) return @{}})
# Add a bit of jitter for realism
Start-Sleep -Milliseconds (Get-Random -Minimum 50 -Maximum 250)
# Update inner hashtable entry
$innerDict['Counter'] += 1
} -ThrottleLimit 10
# Are these really the results we're expecting...?
$rootDict['odd','even']
If the inner hashtable entries were thread-safe to update concurrently, you would expect both counters at 50
, but I get results like this on my laptop:
Name Value
---- -----
Counter 46
Counter 43
We can see multiple updates to the inner 'Counter' entry are lost in the process, presumably due to concurrent updates.
To test this hypothesis, let's do the same experiment, but with another concurrent dictionary type instead of the hashtable:
using namespace System.Collections.Concurrent
# Create thread-safe dictionary with a thread-safe item type
$rootDict = [ConcurrentDictionary[string,ConcurrentDictionary[string,int]]]::new()
1..100 |ForEach-Object -Parallel {
# We need a reference to our safe top-level dictionary
$dict = $using:rootDict
# ... and we need a key
$rootKey = $_ % 2 -eq 0 ? 'even' : 'odd'
# Thread-safe acquisition of inner hashtable
$innerDict = $dict.GetOrAdd($rootKey, {param($key) return @{}})
# Add a bit of jitter for realism
Start-Sleep -Milliseconds (Get-Random -Minimum 50 -Maximum 250)
# Thread-safe update of inner dictionary
[void]$innerDict.AddOrUpdate('Counter', {param($key) return 1}, {param($key,$value) return $value + 1})
} -ThrottleLimit 10
# These should be the exact results we're expecting!
$rootDict['odd','even']
Now I get:
Key Value
--- -----
Counter 50
Counter 50
I have some custom class that extend HashSet to avoid duplicates. Since System.Collections. Concurrent don't have a HashSet class, what is the recommended way to get similar functionality but concurrently?
Instead of explicitly inheriting from HashSet
, I strongly recommend wrapping a HashSet
and then guard all the methods you want to expose to the user with a ReaderWriterLockSlim
- this way you achieve thread-safety without needlessly sacrificing read access performance.
Here, against using [int]
as the sample date type:
using namespace System.Collections.Generic
using namespace System.Threading
class ConcurrentSet
{
hidden [ReaderWriterLockSlim]
$_lock
hidden [HashSet[int]]
$_set
ConcurrentSet()
{
$this._set = [HashSet[int]]::new()
$this._lock = [System.Threading.ReaderWriterLockSlim]::new()
}
[bool]
Add([int]$item)
{
# Any method that modifies the set should be guarded
# by a WriteLock - guaranteeing exclusive update access
$this._lock.EnterWriteLock()
try{
return $this._set.Add($item)
}
finally{
$this._lock.ExitWriteLock()
}
}
[bool]
IsSubsetOf([IEnumerable[int]]$other)
{
# For the read-only methods a read-lock will suffice
$this._lock.EnterReadLock()
try{
return $this._set.IsSubsetOf($other)
}
finally{
$this._lock.ExitReadLock()
}
}
# Repeat appropriate lock pattern for all [HashSet] methods you want to expose
}
You can make the wrapper more flexible by wrapping a HashSet<object>
and controlling the behavior with a custom comparer
Upvotes: 7