jjarv
jjarv

Reputation: 71

Parallel.ForEach gives different results each time

Please help me to convert following loop to Parallel loop. I tried using Parallel.ForEach and ConcurrentBag instead of HashSet, but the wied thing is that "Matched" returns every time different results.

I can't figure it out... Is it because of Thread Safety issues?

Keywords list contains about 500 unique strings, each 1-3 words in lenght.

Items contains about 10000 items.

Original code:

    Dim Items As IEnumerable(Of Item) = Db.Items.GetAll

    Dim Keywords As HashSet(Of String) 
    Dim Matched As HashSet(Of Item)

    For Each Item In Items

        For Each Keyword In Keywords

            If Regex.IsMatch(Headline, String.Format("\b{0}\b", Keyword), RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant) Then 
                If Not Matched.Contains(Item) Then
                    Matched.Add(Item)
                End If

            End If

        Next

    Next

Attempt to convert it to

Dim Items As IEnumerable(Of Item) = Db.Items.GetAll

Dim Keywords As HashSet(Of String) 
Dim Matched As Concurrent.ConcurrentBag(Of Item)

Threading.Tasks.Parallel.ForEach(Of Item)(Items, Sub(Item)
    For Each Keyword In Keywords
        If Regex.IsMatch(Item.Title, String.Format("\b{0}\b", Keyword), RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant) Then
            If Not Matched.Contains(Item) Then
            Matched.Add(Item)
            End If
       Continue For
      End If
   Next
End If

Upvotes: 2

Views: 1696

Answers (1)

svick
svick

Reputation: 244827

Yes, your code certainly isn't thread-safe. Using thread-safe collections won't make your code automatically thread-safe, you still need to use them correctly.

Your problem is that after Contains() finishes but before Add() is called on one thread, Add() can be called on another thread (and the same can possibly also happen while Contains() executes).

What you need to do is to:

  1. Use locking (this means you won't need to use a thread safe collection anymore); or
  2. Use something like ConcurrentHashSet. There is no such class in .Net, but you can use ConcurrentDictionary instead (even if it doesn't fit your needs exactly). Instead of your call to Contains() and then Add(), you could do Matched.TryAdd(Item, True), where True is there just because ConcurrentDictionary needs some value.

Upvotes: 3

Related Questions