Big Daddy
Big Daddy

Reputation: 5224

Parallel ForEach and ConcurrentBag

I've got a ConcurrentBag exposed to read/write operations inside a Parallel.ForEach. Basically, I need to check for the existence of the object in the bag based on several properties and if there's no match, then I add it to the bag. It's really, really slow. Using a List<> without a lock is a fraction of the time. What's wrong with this code? Am I better of using a list locking with a ReaderWriterLockSlim? I am processing about 1,000,000 objects here.

var bag = new ConcurrentBag<Beneficiary>();

Parallel.ForEach(cx, _options, line =>
{
if (!bag.Any(o =>
       o.WinID == beneficiary.WinID &&
       o.ProductType == beneficiary.ProductType &&
       o.FirstName == beneficiary.FirstName &&
       o.LastName == beneficiary.LastName &&
       o.MiddleName == beneficiary.MiddleName))
{
       bag.Add(beneficiary);    
}
}

Upvotes: 2

Views: 4814

Answers (3)

Seb
Seb

Reputation: 1230

You can use a Tuple as key and use a ConcurrentDictionary to store your benificiary object.

var dict = new ConcurrentDictionary<Tuple<int, object, string>, Beneficiary>();

Parallel.ForEach(cx, _options, line =>
{
    string fullname = string.Join("|", line.FirstName, line.LastName, line.MiddleName);

    Tuple<int, object, string> key = new Tuple<int,object,string>(line.WinID, line.ProductType, fullname);

    //if (!dict.ContainsKey(key)) optional line
    {
        dict.TryAdd(key, line);}
    }
});

Once the parallel.ForEach is completed, you can access the distinct beneficiary using a simple ForEach.

Note: you should replace the "object" type by the typeOf ProductType.

Upvotes: 2

Servy
Servy

Reputation: 203828

So first off, the solution that you have there isn't type safe at all. Items can be added to the collection while you're iterating through the copy, or even after you perform the Any but before you call Add. You're also doing a linear search, which isn't going to perform well at all. You're better off both using a dictionary based structure which will have faster lookup, and you also need to make sure that the entire method you have here is logically atomic.

What you could do is use a ConcurrentDictionary and create an IEqualityComparer that checks the 5 properties that you care about, which will allow you to add the items to the dictionary while overwriting duplicates.

Of course, this is all only meaningful if creating each object actually takes a lot of work. If all you're trying to do is get a collection of distinct items then it's likely that trying to parallelize that operation is just not going to be a win. If that's basically all you're doing then the resources each thread needs to do its work are going to be in such contention that the amount of actual parallelization you'll have will be very low, almost certainly way less than the overhead of threading will cost you. You'd likely be better off just using a synchronous Distinct call instead.

Upvotes: 3

Zer0
Zer0

Reputation: 7354

A ConcurrentBag<T> isn't optimized for this type of scenario. It is implemented using ThreadLocal<T> which makes your particular use case slow. You're iterating over the entire collection on many threads repeatedly. Iterating over an entire collection to check for the existence of an object is also slow.

I would suggest overloading Beneficiary.GetHashCode and using a ConcurrentDictionary<Beneficiary, byte>. The byte value can just be ignored and it's effectively a concurrent hashset.

Upvotes: 3

Related Questions