Reputation: 117
My Set
is sometimes sorted, and sometimes not.
Here is the example:
public class SetOfInteger {
public static void main(String[] args) {
Random rand = new Random(47);
Set<Integer> intset = new HashSet<>();
for (int i = 0; i < 10; i++) {
int j = rand.nextInt(30);
System.out.print(j + " ");
intset.add(j);
}
System.out.println();
System.out.println(intset);
}
}
The result shows that the set
is not sorted.
8 5 13 11 1 29 28 20 12 7
[1, 20, 5, 7, 8, 11, 12, 29, 28, 13]
When I change the termination expression to i < 20
in the for statement, the result shows that the set
become sorted.
8 5 13 11 1 29 28 20 12 7 18 18 21 19 29 28 28 1 20 28
[1, 5, 7, 8, 11, 12, 13, 19, 18, 21, 20, 29, 28]
It is so strange, is it? I just don't know how to explain it, and I need some help, Thank you very much.
Upvotes: 5
Views: 1311
Reputation: 621
There are some good answers all around, but none attempt to explain what exactly happens in this particular situation, so I'll limit my answer to that, rather than add another explanation of how the HashSet works. I'm taking that understanding as granted.
The default constructor of HashSet creates a set with a capacity of 16 and a load factor of 0.75. That means there are 16 bins, and this capacity is increased when you insert 16 * 0.75 = 12 unique elements.
That's why in the first case, the numbers are sorted by their remainder when divided by 16: the set started with a table size of 16, "hashing" each element to a bin by taking x % 16
. Then when there were 12 elements, it grew the table and performed a rehash (see Javier Martin's answer if that's not clear), probably growing the table to 32. (I could only find information about how it grows in the java 6 doc, which states that the number of buckets is "approximately" doubled, whatever that means.) That gave each integer under 30 its own bin, so when the set iterated over each bin in order, it iterated over the numbers in order. If you inserted numbers below 64, you'd probably find that you need to insert 32*0.75 = 24 elements before the iteration appears sorted.
Also note that this way of assigning bins is not guaranteed behavior. HashSets in other Java versions/implementations might do something more complicated with the objects' hashCode()
values than simply taking a remainder. (As noted by ruakh and fluffy in the comments - thanks!)
Upvotes: 6
Reputation: 5316
Interesting question. Set uses array of linked list
to store its elements. hashCode()
is used to find the position (indirectly) of the object to be stored in the Set
.
In case there are two objects needing to be stored in same position then the object is stored in the next slot of the linked list at that position.
The size of the array is dynamic and computed run time as per the number of objects in it. Its not sure but I assume you see your numbers as sorted because the Set might have increased the size. The hashCode()
is dependent upon the number value and so would have been computed sequentially. As the size of the underlying array would have increased with the increase of size of loop. There would have been no collisions and the output is sorted.
But Still I would like to emphasis so that my answer does not lead to any misconception. HashSet
does not guarantee any ordering of the elements
Upvotes: 3
Reputation: 11463
Your question points out that item order changes as the set grows bigger. However, you can't count on the order being preserved. A Set
has one guarantee: there is only one of each kind of element. There are other Set
objects that provide further guarantees, but a simple HashSet
provides no guarantee of order.
The re-ordering you see is simply an internal reshuffling due to the way the HashSet is stored internally. In a very simplified way of thinking, the HashSet has a certain number of "slots" to store values which is usually an odd number if not also prime. The hashcodes from getHashCode()
are used to assign the object to a slot. When you have a hash code collision, then the HashSet uses the equality operator equals()
to determine if the objects are in fact unique.
As you add items to a HashSet
several things happen:
HashSet
needs to resize itself
The bottom line is that if the objects magically sorted themselves, that's not an implementation you can count on unless you are using a TreeSet
which imposes a sort order on the set items.
Upvotes: 5
Reputation: 70564
A HashSet does not guarantee sorted iteration, but under very specific circumstances its internal data structure may act like a bucket sort.
Specifically, for integer keys in the range [0,65535] and a table size that is greater than the largest key, the index of the bucket a key is stored in is equal to the key itself, and since the iterator iterates in bucket order, it emits the elements in sorted order.
Upvotes: 14
Reputation: 2605
The iteration order of a HashSet is not defined, the only guarantee is that it is consistent: iterating over a HashSet that has not been modified will produce the same sequences.
Internally, as a commenter said, the class uses the hashCode method of each element to store them in a certain number of bins. So, for example, if it's using 20 bins then it could take o.hashCode() % 20
as the bin index. Each bin can have several items in a list, which are then distinguished by the equals method. Thus, even if the hash of an Integer is its int value, the order need not be the natural integer ordering.
Furthermore, the set monitors its load factor when inserting and removing elements; considering the fraction of free bins, the maximum bin list size, the average number of items per bin, whatever. When it considers appropriate it performs a rehash, which means changing the number of bins that is used to store the elements, so their bin index changes because the n in the o.hashCode() % n
changes. Every element is "reshuffled" to its new place (this is a costly operation), thus explaining the different ordering you see after adding more elements.
Upvotes: 3
Reputation: 1377
You must sort it manually, because there is no guarantee that the hashset will be sorted. If you want you can use also TreeSet which will provide the functionality you want, but if you want to use HashSet anyway try this:
Set intset = new HashSet();
List sortedIntList = new ArrayList(intset);
Collections.sort(sortedIntList);
Upvotes: 1