Reputation: 38899
Why is Hashset called a "Hash"-set?
I understand we call hashtable or a hashmap since it's a key value store and when we put(), then the key is hashed and distributed evenly using a good hash function.
I assume its called a HashSet because when we add(), the value is hashed and stored to keep it unique. But why the overkill? We don't really care about "equal distribution" of data like we do in a Hash table.
Upvotes: 9
Views: 3657
Reputation:
HashSet (like HashMap) uses, well, hashing, to achieve O(1) amortized set/test/remove performance. (There were some incorrect assumptions in the question about HashSet not using hashing.)
Now, in Java, all objects are "hashable" -- that is, they have a hashCode()
function (as they as descendants of Object). The quality of this hashing function will allow a hash algorithm to reach the above anticipated performance characteristics by "spreading the objects [uniformly] through the buckets". (The default Object implementations of hashCode/equals amount to object-identity. Generally this should be changed for any subclass.)
However, if your class implements hashCode
poorly (e.g. returns 1 for all values) then the performance of HashSet/HashMap will suffer greatly as a result (for any non-trivial n). It is important to note that hashCode
determines the bucket but equals
determines, well, the actual equality which may be used even if the hash code is unique and/or there are no collisions (for instance, to make sure the test/get doesn't return a false-positive -- it could conceivably be eliminated on a non-collision set/insert).
Just make sure to follow the requirements setup in Object wrt. hashCode
and equals
or objects may be lost. A poor hashing function which honors the rules will still work -- albeit at potentially poor performance. (Mutable object are especially problematic for use in hash ADTs because the hash code and/or equality may not always be stable.)
Upvotes: 2
Reputation: 310893
What 'overkill'? The idea of a HashXXX for any X is to provide O(1) performance, and that is accomplished by hashing. If you don't want O(1) performance, don't use it. Use a TreeSet for example.
Upvotes: 0
Reputation: 110046
A HashSet
is called a HashSet
because hashing is indeed important to its functionality. Operations like contains(Object)
(arguably the most important method in a Set
) and remove(Object)
are able to work in constant time by making use of the hash code of the object (by way of HashMap
).
Upvotes: 4
Reputation: 47964
We do care about equal distribution because we want constant time performance on our basic Collection
operations. In order to respect the basic rules of a SET
, no two objects are equal, we want to find a potentially equal match quickly. HashSet
is one fairly good way of doing it. Compare to a theoretical ArraySet
where adding a new element is a linear time operation to iterate and check every single existing entry for equality.
Upvotes: 13