Big Boss
Big Boss

Reputation: 73

String HashSet produces an output that doesn't match String Hash function

I am preparing for a UIL computer contest, and one of the practice questions asked involved a set, here it is, word for word:

HashSet<String> set = new HashSet<String>();
set.add("000");
set.add("212");
set.add("211");
set.add("555");
set.add("343");
System.out.println(set);

Now, I understand that HashSets are unsorted data structures, but there were 2 unsorted choices shown, and 1 sorted answer choice:

A) [000, 211, 343, 212, 555]

B) [000, 211, 212, 343, 555]

c) [000, 211, 555, 343, 212]

I naively picked B) the sorted answer, while the correct answer is A, but I still do not understand why. I found the hash code generated by each of the string's hash functions:

"000" - 47664

"212" - 49619

"211" - 49618

"555" - 52629

"343" - 50674

To my knowledge the hash set utilizes a hashtable as its backend. In that case, based on the hash codes, I don't understand why answer is incorrect. I plugged the code into java and it produces the result in the correct answer choice. What is going on here, how exactly does a HashSet add items to itself?

Upvotes: 0

Views: 93

Answers (4)

Jonathan Rosenne
Jonathan Rosenne

Reputation: 2227

1) The question is wrong. According to the specification of Class HashSet,

It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

As the result depends on the implementation the question should not have been asked or alternatively the answer should have been "None of the above".

2) The actual implementation has changed. The hash used is not the familiar hashCode of key but (h = key.hashCode()) ^ (h >>> 16). This hash is then used for hash & (SIZE -1).

Upvotes: 0

Tanvi Jaywant
Tanvi Jaywant

Reputation: 415

"000" - 47664

"212" - 49619

"211" - 49618

"555" - 52629

"343" - 50674

Find the mod of each of them divided by 16. Then sort it. If there is a bucket match then last one added is first in the list ( since its "implemented as a stack" when there is a collision. )

Upvotes: 0

rgettman
rgettman

Reputation: 178333

A HashSet uses a HashMap as its backing data:

This class implements the Set interface, backed by a hash table (actually a HashMap instance).

The HashSet uses the HashMap to quickly determine if an element is already present.

The HashMap has a default capacity of 16.

Constructs an empty HashMap with the default initial capacity (16) and the default load factor (0.75).

You've added 5 items, not nearly enough to meet the load factor times capacity, so no resizing is done.

You need to know how the HashMap uses the hash code to determine which bucket to use. According to this answer, Java does hash & (SIZE -1) to extract only the lowermost n bits, where the capacity is 2n.

So, the bucket numbers are as follows:

"000" - 47664 -> 0

"212" - 49619 -> 3

"211" - 49618 -> 2

"555" - 52629 -> 5

"343" - 50674 -> 2

Printing out the set uses an Iterator that walks the buckets in order. Within the same bucket, insertion order is used.

Therefore the order is:

  1. "000" - 47664 -> 0
  2. "211" - 49618 -> 2
  3. "343" - 50674 -> 2
  4. "212" - 49619 -> 3
  5. "555" - 52629 -> 5

And that matches choice "A".

Upvotes: 1

Kayaman
Kayaman

Reputation: 73578

Because the backing array can be smaller than hashcodes, the order is not determined by only hashcode(), but hashcode() % arr.length;. The default size of the backing array is 16, so if you calculate the modulo for the hashcodes you listed, you'll get the correct order.

For elements with same hashcodes, the one inserted first is also the first to be printed. This is due to the buckets holding lists of all objects in the same bucket (and using equals() to determine whether they're the same object or if they just happened to get the same bucket).

Upvotes: 3

Related Questions