Jamona Mican
Jamona Mican

Reputation: 1654

Does the order of HashSet.Intersect() matter for performance?

I'm guessing it does NOT, but if someone could confirm.

If I try to intersect two sets: A (1 million items) B (1 item)

Does the framework always do A.Contains(B) once, instead of B.Contains(A) one million times?

This is assuming that's how intersects work beneath the hood, as opposite to some fancy algorithm I am not aware of.

UPDATE:

OK so for c# you should clearly do B.InsersectWith(A), if B << A. Intersect() is defined on IEnumerable and would be a lot less efficient based on the answers below (and MSDN). So the order does matter if you use the best tool, which is IntersectWith().

Upvotes: 0

Views: 2244

Answers (3)

paparazzo
paparazzo

Reputation: 45096

From the documentation

If the collection represented by the other parameter is a HashSet collection with the same equality comparer as the current HashSet object, this method is an O(n) operation. Otherwise, this method is an O(n + m) operation, where n is Count and m is the number of elements in other.

HashSet.IntersectWith Method

And if you are looking for speed implement (override) GetHashCode if you can derive a meaningful Hash from your data. And override Equal. I do this for any class that will be in a collection.

Object.GetHashCode Method

Upvotes: 3

Jason Coyne
Jason Coyne

Reputation: 6636

The code is written to be for the general case. If you are a special case like that, you should implement custom logic that is efficient for your particular use case.

The Contains() method just iterates through the list until it finds a match, so the order would definately be important if thats what it was doing, but I believe the other answer is correct in terms of how it works, because that means iterating each item a maximum of one time, vs a "Contains" solution could iterate the entire "child" list for each element in the master list.

Actual solution = x+y iterations Contains solution = x+(x*y) iterations

Upvotes: 0

coredump
coredump

Reputation: 3107

It depends if you are asking as a general question or for a specific language.

In Java, it will iterate over the second set and then over the first to see if it contains that element. So it will still iterate over both sets.

In c#, what the method does is to enumerate on the elements of the first set (A), then enumerate on the elements of the second set(B) and mark those that are common, afterwards it yields those elements in that order.

So, to answer your question, I would say that it does not. This is it has to go through every container

Upvotes: 0

Related Questions