Efie
Efie

Reputation: 1690

Are non-concurrent collections safe inside concurrent collections?

I'm looking to start implementing some concurrent functions in a project that I'm working on. I recently discovered the System.Collections.Concurrent namespace that I am planning to leverage.

The object that I use to track the overall state of the actions is essentially a dictionary with some nested custom objects. My thought is that as long as the highest level collection is configured to be concurrent/thread safe, it doesn't matter if the nested collections are, since the data will be locked by the higher level collection.

Is this the correct assumption?

For example, would something in PowerShell like the following be safe to use?

[System.Collections.Concurrent.ConcurrentDictionary[[String], [MyCustomClass]]]::new()

Additionally, I have some custom class that extend HashSet to avoid duplicates. Since System.Collections. Concurrent don't have a HashSet class, what is the recommended way to get similar functionality but concurrently?

Upvotes: 3

Views: 1336

Answers (1)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174690

My thought is that as long as the highest level collection is configured to be concurrent/thread safe, it doesn't matter if the nested collections are, since the data will be locked by the higher level collection.

Is this the correct assumption?

No, that's not a safe assumption.

Let's say you create a concurrent dictionary containing a bunch of regular hashtables:

using namespace System.Collections.Concurrent

# Create thread-safe dictionary
$rootDict = [ConcurrentDictionary[string,hashtable]]::new()
  • $rootDict is now thread-safe - more than one thread cannot concurrently modify the 'A' entry by overwriting the reference to the hashtable
  • Any inner hashtable that we add to $rootDict is NOT thread-safe - it'll still just be a regular hashtable

In PowerShell 7 this can be observed when using ForEach-Object -Parallel to operate on such a data structure:

using namespace System.Collections.Concurrent

# Create thread-safe dictionary
$rootDict = [ConcurrentDictionary[string,hashtable]]::new()

1..100 |ForEach-Object -Parallel {
  # We need a reference to our safe top-level dictionary
  $dict = $using:rootDict

  # ... and we need a key
  $rootKey = $_ % 2 -eq 0 ? 'even' : 'odd'

  # Thread-safe acquisition of inner hashtable
  $innerDict = $dict.GetOrAdd($rootKey, {param($key) return @{}})

  # Add a bit of jitter for realism
  Start-Sleep -Milliseconds (Get-Random -Minimum 50 -Maximum 250)

  # Update inner hashtable entry
  $innerDict['Counter'] += 1
} -ThrottleLimit 10

# Are these really the results we're expecting...? 
$rootDict['odd','even']

If the inner hashtable entries were thread-safe to update concurrently, you would expect both counters at 50, but I get results like this on my laptop:

Name                           Value
----                           -----
Counter                        46
Counter                        43

We can see multiple updates to the inner 'Counter' entry are lost in the process, presumably due to concurrent updates.


To test this hypothesis, let's do the same experiment, but with another concurrent dictionary type instead of the hashtable:

using namespace System.Collections.Concurrent

# Create thread-safe dictionary with a thread-safe item type
$rootDict = [ConcurrentDictionary[string,ConcurrentDictionary[string,int]]]::new()

1..100 |ForEach-Object -Parallel {
  # We need a reference to our safe top-level dictionary
  $dict = $using:rootDict

  # ... and we need a key
  $rootKey = $_ % 2 -eq 0 ? 'even' : 'odd'

  # Thread-safe acquisition of inner hashtable
  $innerDict = $dict.GetOrAdd($rootKey, {param($key) return @{}})

  # Add a bit of jitter for realism
  Start-Sleep -Milliseconds (Get-Random -Minimum 50 -Maximum 250)

  # Thread-safe update of inner dictionary
  [void]$innerDict.AddOrUpdate('Counter', {param($key) return 1}, {param($key,$value) return $value + 1})
} -ThrottleLimit 10

# These should be the exact results we're expecting! 
$rootDict['odd','even']

Now I get:

Key     Value
---     -----
Counter    50
Counter    50

I have some custom class that extend HashSet to avoid duplicates. Since System.Collections. Concurrent don't have a HashSet class, what is the recommended way to get similar functionality but concurrently?

Instead of explicitly inheriting from HashSet, I strongly recommend wrapping a HashSet and then guard all the methods you want to expose to the user with a ReaderWriterLockSlim - this way you achieve thread-safety without needlessly sacrificing read access performance.

Here, against using [int] as the sample date type:

using namespace System.Collections.Generic
using namespace System.Threading

class ConcurrentSet
{
    hidden [ReaderWriterLockSlim]
    $_lock

    hidden [HashSet[int]]
    $_set

    ConcurrentSet()
    {
        $this._set = [HashSet[int]]::new()
        $this._lock = [System.Threading.ReaderWriterLockSlim]::new()
    }

    [bool]
    Add([int]$item)
    {
        # Any method that modifies the set should be guarded
        # by a WriteLock - guaranteeing exclusive update access
        $this._lock.EnterWriteLock()
        try{
            return $this._set.Add($item)
        }
        finally{
            $this._lock.ExitWriteLock()
        }
    }

    [bool]
    IsSubsetOf([IEnumerable[int]]$other)
    {
        # For the read-only methods a read-lock will suffice
        $this._lock.EnterReadLock()
        try{
            return $this._set.IsSubsetOf($other)
        }
        finally{
            $this._lock.ExitReadLock()
        }
    }

    # Repeat appropriate lock pattern for all [HashSet] methods you want to expose
}

You can make the wrapper more flexible by wrapping a HashSet<object> and controlling the behavior with a custom comparer

Upvotes: 7

Related Questions