Reputation: 71
Please help me to convert following loop to Parallel loop. I tried using Parallel.ForEach and ConcurrentBag instead of HashSet, but the wied thing is that "Matched" returns every time different results.
I can't figure it out... Is it because of Thread Safety issues?
Keywords list contains about 500 unique strings, each 1-3 words in lenght.
Items contains about 10000 items.
Original code:
Dim Items As IEnumerable(Of Item) = Db.Items.GetAll
Dim Keywords As HashSet(Of String)
Dim Matched As HashSet(Of Item)
For Each Item In Items
For Each Keyword In Keywords
If Regex.IsMatch(Headline, String.Format("\b{0}\b", Keyword), RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant) Then
If Not Matched.Contains(Item) Then
Matched.Add(Item)
End If
End If
Next
Next
Attempt to convert it to
Dim Items As IEnumerable(Of Item) = Db.Items.GetAll
Dim Keywords As HashSet(Of String)
Dim Matched As Concurrent.ConcurrentBag(Of Item)
Threading.Tasks.Parallel.ForEach(Of Item)(Items, Sub(Item)
For Each Keyword In Keywords
If Regex.IsMatch(Item.Title, String.Format("\b{0}\b", Keyword), RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant) Then
If Not Matched.Contains(Item) Then
Matched.Add(Item)
End If
Continue For
End If
Next
End If
Upvotes: 2
Views: 1696
Reputation: 244827
Yes, your code certainly isn't thread-safe. Using thread-safe collections won't make your code automatically thread-safe, you still need to use them correctly.
Your problem is that after Contains()
finishes but before Add()
is called on one thread, Add()
can be called on another thread (and the same can possibly also happen while Contains()
executes).
What you need to do is to:
ConcurrentHashSet
. There is no such class in .Net, but you can use ConcurrentDictionary
instead (even if it doesn't fit your needs exactly). Instead of your call to Contains()
and then Add()
, you could do Matched.TryAdd(Item, True)
, where True
is there just because ConcurrentDictionary
needs some value.Upvotes: 3