zengr
zengr

Reputation: 38899

Why does HashSet have "Hash" in its name?

Why is Hashset called a "Hash"-set?

I understand we call hashtable or a hashmap since it's a key value store and when we put(), then the key is hashed and distributed evenly using a good hash function.

I assume its called a HashSet because when we add(), the value is hashed and stored to keep it unique. But why the overkill? We don't really care about "equal distribution" of data like we do in a Hash table.

Upvotes: 9

Views: 3657

Answers (4)

user166390
user166390

Reputation:

HashSet (like HashMap) uses, well, hashing, to achieve O(1) amortized set/test/remove performance. (There were some incorrect assumptions in the question about HashSet not using hashing.)

Now, in Java, all objects are "hashable" -- that is, they have a hashCode() function (as they as descendants of Object). The quality of this hashing function will allow a hash algorithm to reach the above anticipated performance characteristics by "spreading the objects [uniformly] through the buckets". (The default Object implementations of hashCode/equals amount to object-identity. Generally this should be changed for any subclass.)

However, if your class implements hashCode poorly (e.g. returns 1 for all values) then the performance of HashSet/HashMap will suffer greatly as a result (for any non-trivial n). It is important to note that hashCode determines the bucket but equals determines, well, the actual equality which may be used even if the hash code is unique and/or there are no collisions (for instance, to make sure the test/get doesn't return a false-positive -- it could conceivably be eliminated on a non-collision set/insert).

Just make sure to follow the requirements setup in Object wrt. hashCode and equals or objects may be lost. A poor hashing function which honors the rules will still work -- albeit at potentially poor performance. (Mutable object are especially problematic for use in hash ADTs because the hash code and/or equality may not always be stable.)

Upvotes: 2

user207421
user207421

Reputation: 310893

What 'overkill'? The idea of a HashXXX for any X is to provide O(1) performance, and that is accomplished by hashing. If you don't want O(1) performance, don't use it. Use a TreeSet for example.

Upvotes: 0

ColinD
ColinD

Reputation: 110046

A HashSet is called a HashSet because hashing is indeed important to its functionality. Operations like contains(Object) (arguably the most important method in a Set) and remove(Object) are able to work in constant time by making use of the hash code of the object (by way of HashMap).

Upvotes: 4

Affe
Affe

Reputation: 47964

We do care about equal distribution because we want constant time performance on our basic Collection operations. In order to respect the basic rules of a SET, no two objects are equal, we want to find a potentially equal match quickly. HashSet is one fairly good way of doing it. Compare to a theoretical ArraySet where adding a new element is a linear time operation to iterate and check every single existing entry for equality.

Upvotes: 13

Related Questions