Kirby
Kirby

Reputation: 3709

What is the lookup time complexity of HashSet<T>(IEqualityComparer<T>)?

In C#.NET, I like using HashSets because of their supposed O(1) time complexity for lookups. If I have a large set of data that is going to be queried, I often prefer using a HashSet to a List, since it has this time complexity.

What confuses me is the constructor for the HashSet, which takes IEqualityComparer as an argument:

http://msdn.microsoft.com/en-us/library/bb359100.aspx

In the link above, the remarks note that the "constructor is an O(1) operation," but if this is the case, I am curious if lookup is still O(1).

In particular, it seems to me that, if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).

Does the implementation internally construct a lookup table as elements are added to the collection?

In general, how might I ascertain information about complexity of .NET data structures?

Upvotes: 33

Views: 41597

Answers (5)

nikstffrs
nikstffrs

Reputation: 344

Actually the lookup time of a HashSet<T> isn't always O(1).

As others have already mentioned a HashSet uses IEqualityComparer<T>.GetHashCode().
Now consider a struct or object which always returns the same hash code x.

If you add n items to your HashSet there will be n items with the same hash in it (as long as the objects aren't equal).
So if you were to check if an element with the hash code x exists in your HashSet it will run equality checks for all objects with the hash code x to test wether the HashSet contains the element

Upvotes: 4

Eric Lippert
Eric Lippert

Reputation: 659956

if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).

Let's call the value you are searching for the "query" value.

Can you explain why you believe the comparer has to be executed on every key to see if it matches the query?

This belief is false. (Unless of course the hash code supplied by the comparer is the same for every key!) The search algorithm executes the equality comparer on every key whose hash code matches the query's hash code, modulo the number of buckets in the hash table. That's how hash tables get O(1) lookup time.

Does the implementation internally construct a lookup table as elements are added to the collection?

Yes.

In general, how might I ascertain information about complexity of .NET data structures?

Read the documentation.

Upvotes: 18

Scott Stafford
Scott Stafford

Reputation: 44776

A HashSet works via hashing (via IEqualityComparer.GetHashCode) the objects you insert and tosses the objects into buckets per the hash. The buckets themselves are stored in an array, hence the O(1) part.

For example (this is not necessarily exactly how the C# implementation works, it just gives a flavor) it takes the first character of the hash and throws everything with a hash starting with 1 into bucket 1. Hash of 2, bucket 2, and so on. Inside that bucket is another array of buckets that divvy up by the second character in the hash. So on for every character in the hash....

Now, when you look something up, it hashes it, and jumps thru the appropriate buckets. It has to do several array lookups (one for each character in the hash) but does not grow as a function of N, the number of objects you've added, hence the O(1) rating.

To your other question, here is a blog post with the complexity of a number of collections' operations: http://c-sharp-snippets.blogspot.com/2010/03/runtime-complexity-of-net-generic.html

Upvotes: 33

sll
sll

Reputation: 62484

It would depends on quality of hash function (GetHashCode()) your IEqualityComparer implementation provides. Ideal hash function should provide well-distributed random set of hash codes. These hash codes will be used as an index which allows mapping key to a value, so search for a value by key becomes more efficient especially when a key is a complex object/structure.

the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).

This is not how hashtable works, this is some kind of straightforward bruteforce search. In case of hashtable you would have more intelligent approach which uses search by index (hash code).

Upvotes: 3

phoog
phoog

Reputation: 43036

Lookup is still O(1) if you pass an IEqualityComparer. The hash set still uses the same logic as if you don't pass an IEqualityComparer; it just uses the IEqualityComparer's implementations of GetHashCode and Equals instead of the instance methods of System.Object (or the overrides provided by the object in question).

Upvotes: 1

Related Questions