Reputation: 5461
This is probably a very common problem which has a lot of answers. I was not able to get to an answer because I am not very sure how to search for it.
I have two collections of objects - both come from the database, and in some cases those collections are of the same object type. Further, I need to do some operations for every combination of those collections. So, for example:
foreach(var a in collection1){
foreach(var b in collection2){
if(a.Name == b.Name && a.Value != b.Value)
//do something with this combination
else
//do something else
}
}
This is very inefficient and it gets slower based on the number of objects in both collections.
What is the best way to solve this type of problems?
EDIT:
I am using .NET 4 at the moment so I am also interested in suggestions using Parallelism to speed that up.
EDIT 2: I have added above an example of the business rules that need to be performed on each combination of objects. However, the business rules defined in the example can vary.
EDIT 3: For example, inside the loop the following will be done: If the business rules are satisfied (see above) a record will be created in the database with a reference to object A and object B. This is one of the operations that I need to do. (Operations will be configurable from child classes using this class).
Upvotes: 2
Views: 278
Reputation: 133995
If you really have to to process every item in list b for each item in list a, then it's going to take time proportional to a.Count * b.Count
. There's nothing you can do to prevent it. Adding parallel processing will give you a linear speedup, but that's not going to make a dent in the processing time if the lists are even moderately large.
How large are these lists? Do you really have to check every combination of a
and b
? Can you give us some more information about the problem you're trying to solve? I suspect that there's a way to bring a more efficient algorithm to bear, which would reduce your processing time by orders of magnitude.
Edit after more info posted
I know that the example you posted is just an example, but it shows that you can find a better algorithm for at least some of your cases. In this particular example, you could sort a
and b
by name, and then do a straight merge. Or, you could sort b
into an array or list, and use binary search to look up the names. Either of those two options would perform much better than your nested loops. So much better, in fact, that you probably wouldn't need to bother with parallelizing things.
Look at the numbers. If your a
has 4,000 items in it and b
has 100,000 items in it, your nested loop will do 400 million comparisons (a.Count * b.Count
). But sorting is only n log n
, and the merge is linear. So sorting and then merging will be approximately (a.Count * 12) + (b.Count * 17) + a.Count + b.Count
, or in the neighborhood of 2 million comparisons. So that's approximately 200 times faster.
Compare that to what you can do with parallel processing: only a linear speedup. If you have four cores and you get a pure linear speedup, you'll only cut your time by a factor of four. The better algorithm cut the time by a factor of 200, with a single thread.
You just need to find better algorithms.
LINQ might also provide a good solution. I'm not an expert with LINQ, but it seems like it should be able to make quick work of something like this.
Upvotes: 3
Reputation: 30097
Parallel.ForEach(a, currentA => Parallel.ForEach(b, currentB =>
{
// do something with currentA and currentB
}));
Upvotes: 1
Reputation: 27944
First of all, there is a reason you are searching with a value from the first collection in the second collection.
For example if you want to know that a value excites in the the second collection, you should put the second collection in a hashset, this will allow you to do a fast lookup. Creating the HashSet and accessing it is like 1 vs n for looping the collection.
Upvotes: 1
Reputation: 3408
If you need to check all the variants one by one you can't do anything better. BUT you can parallel the loops. For ex if you are using c# 4.0 you can use parallel foreach loop.
You can find an example here... http://msdn.microsoft.com/en-us/library/dd460720.aspx
foreach(var a in collection1){
Parallel.ForEach(collection2, b =>
{
//do something with a and b
} //close lambda expression
);
}
In the same way you can parallel the first loop as well.
Upvotes: 1