Reputation: 73
I am preparing for a UIL computer contest, and one of the practice questions asked involved a set, here it is, word for word:
HashSet<String> set = new HashSet<String>();
set.add("000");
set.add("212");
set.add("211");
set.add("555");
set.add("343");
System.out.println(set);
Now, I understand that HashSets are unsorted data structures, but there were 2 unsorted choices shown, and 1 sorted answer choice:
A) [000, 211, 343, 212, 555]
B) [000, 211, 212, 343, 555]
c) [000, 211, 555, 343, 212]
I naively picked B) the sorted answer, while the correct answer is A, but I still do not understand why. I found the hash code generated by each of the string's hash functions:
"000" - 47664
"212" - 49619
"211" - 49618
"555" - 52629
"343" - 50674
To my knowledge the hash set utilizes a hashtable as its backend. In that case, based on the hash codes, I don't understand why answer is incorrect. I plugged the code into java and it produces the result in the correct answer choice. What is going on here, how exactly does a HashSet add items to itself?
Upvotes: 0
Views: 93
Reputation: 2227
1) The question is wrong. According to the specification of Class HashSet,
It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.
As the result depends on the implementation the question should not have been asked or alternatively the answer should have been "None of the above".
2) The actual implementation has changed. The hash used is not the familiar hashCode of key
but (h = key.hashCode()) ^ (h >>> 16)
. This hash is then used for hash & (SIZE -1)
.
Upvotes: 0
Reputation: 415
"000" - 47664
"212" - 49619
"211" - 49618
"555" - 52629
"343" - 50674
Find the mod of each of them divided by 16. Then sort it. If there is a bucket match then last one added is first in the list ( since its "implemented as a stack" when there is a collision. )
Upvotes: 0
Reputation: 178333
A HashSet
uses a HashMap
as its backing data:
This class implements the
Set
interface, backed by a hash table (actually aHashMap
instance).
The HashSet
uses the HashMap
to quickly determine if an element is already present.
The HashMap
has a default capacity of 16
.
Constructs an empty
HashMap
with the default initial capacity (16) and the default load factor (0.75).
You've added 5 items, not nearly enough to meet the load factor times capacity, so no resizing is done.
You need to know how the HashMap
uses the hash code to determine which bucket to use. According to this answer, Java does hash & (SIZE -1)
to extract only the lowermost n bits, where the capacity is 2n.
So, the bucket numbers are as follows:
"000" - 47664 -> 0
"212" - 49619 -> 3
"211" - 49618 -> 2
"555" - 52629 -> 5
"343" - 50674 -> 2
Printing out the set uses an Iterator
that walks the buckets in order. Within the same bucket, insertion order is used.
Therefore the order is:
And that matches choice "A".
Upvotes: 1
Reputation: 73578
Because the backing array can be smaller than hashcodes, the order is not determined by only hashcode()
, but hashcode() % arr.length;
. The default size of the backing array is 16
, so if you calculate the modulo for the hashcodes you listed, you'll get the correct order.
For elements with same hashcodes, the one inserted first is also the first to be printed. This is due to the buckets holding lists of all objects in the same bucket (and using equals()
to determine whether they're the same object or if they just happened to get the same bucket).
Upvotes: 3