Reputation: 5224
I've got a ConcurrentBag
exposed to read/write operations inside a Parallel.ForEach
. Basically, I need to check for the existence of the object in the bag based on several properties and if there's no match, then I add it to the bag. It's really, really slow. Using a List<>
without a lock is a fraction of the time. What's wrong with this code? Am I better of using a list locking with a ReaderWriterLockSlim
? I am processing about 1,000,000 objects here.
var bag = new ConcurrentBag<Beneficiary>();
Parallel.ForEach(cx, _options, line =>
{
if (!bag.Any(o =>
o.WinID == beneficiary.WinID &&
o.ProductType == beneficiary.ProductType &&
o.FirstName == beneficiary.FirstName &&
o.LastName == beneficiary.LastName &&
o.MiddleName == beneficiary.MiddleName))
{
bag.Add(beneficiary);
}
}
Upvotes: 2
Views: 4814
Reputation: 1230
You can use a Tuple
as key and use a ConcurrentDictionary
to store your benificiary
object.
var dict = new ConcurrentDictionary<Tuple<int, object, string>, Beneficiary>();
Parallel.ForEach(cx, _options, line =>
{
string fullname = string.Join("|", line.FirstName, line.LastName, line.MiddleName);
Tuple<int, object, string> key = new Tuple<int,object,string>(line.WinID, line.ProductType, fullname);
//if (!dict.ContainsKey(key)) optional line
{
dict.TryAdd(key, line);}
}
});
Once the parallel.ForEach
is completed, you can access the distinct beneficiary using a simple ForEach
.
Note: you should replace the "object" type by the typeOf ProductType.
Upvotes: 2
Reputation: 203828
So first off, the solution that you have there isn't type safe at all. Items can be added to the collection while you're iterating through the copy, or even after you perform the Any
but before you call Add
. You're also doing a linear search, which isn't going to perform well at all. You're better off both using a dictionary based structure which will have faster lookup, and you also need to make sure that the entire method you have here is logically atomic.
What you could do is use a ConcurrentDictionary
and create an IEqualityComparer
that checks the 5 properties that you care about, which will allow you to add the items to the dictionary while overwriting duplicates.
Of course, this is all only meaningful if creating each object actually takes a lot of work. If all you're trying to do is get a collection of distinct items then it's likely that trying to parallelize that operation is just not going to be a win. If that's basically all you're doing then the resources each thread needs to do its work are going to be in such contention that the amount of actual parallelization you'll have will be very low, almost certainly way less than the overhead of threading will cost you. You'd likely be better off just using a synchronous Distinct
call instead.
Upvotes: 3
Reputation: 7354
A ConcurrentBag<T>
isn't optimized for this type of scenario. It is implemented using ThreadLocal<T>
which makes your particular use case slow. You're iterating over the entire collection on many threads repeatedly. Iterating over an entire collection to check for the existence of an object is also slow.
I would suggest overloading Beneficiary.GetHashCode
and using a ConcurrentDictionary<Beneficiary, byte>
. The byte value can just be ignored and it's effectively a concurrent hashset.
Upvotes: 3