Reputation: 35
I have been doing some reading on this topic, so far from what I understand for Adding, Removing and Search operations, HashSet is faster with O(1) time complexity while TreeSet gets O(log n) for the same operations. When iterating through the elements both HashSet and TreeSet have the time complexity of O(n).
So what is a use case when TreeSet is faster than HashSet?
Upvotes: 2
Views: 1405
Reputation: 46960
In general you can best compare capabilities of Java container classes by looking at the interfaces they implement. Checking the HashSet javadoc, you'll see it has Iterable<E>, Collection<E>, Set<E>
. TreeSet has Iterable<E>, Collection<E>, NavigableSet<E>, Set<E>, SortedSet<E>
.
So the difference is SortedSet
and NavigableSet
. Those are methods TreeSet offers and HashSet doesn't. If in turn you look up their javadoc, you'll find a range of behaviors organized to exploit the ordering of elements in a TreeSet. HashSets have no concept of element ordering. That's the main difference. If you want to impose an order on the elements, you're generally limited to sorting them separately, whereas traversing TreeSets in their natural order is an amortized constant time per item. (Individual steps of the traversal can take time proportional log n.)
In practice there aren't too many use cases where the difference between the O(1) expected amortized time of HashSet performance and the O(log n) guaranteed time of TreeSet for the methods they have in common turns out to be important. Remember log_2(n) is for nearly all practical purposes less than 40. Doing a few instructions 40 times often turns out to be noise in the performance of the calling algorithm.
When the difference is important, you still need to account for the the expected amortized nature of the hash performance, since any given add()
can require O(n) time to expand the internal bucket array and rehash all the contents. In some applications, this is a killer. For example, your game normally runs like lightning, but once in a while there's a stutter while a 10 Mb hash set is grown to 20 Mb. Similarly, if your data just happens to work badly with the HashMap's hash function (or maybe the data are coming from malicious users who are deliberately trying to break it), performance can be more like O(n) than O(1).
TreeSet's performance doesn't have such grand performance quirks. E.g. reorganizing a red-black tree can take time proportional only to log_(n), and that' rare. To wit, later versions of HashSet actually use tree sets for buckets to avoid exploitation by bad guys.
Upvotes: 9
Reputation: 424973
Technically, the two can’t be fairly compared. A HashSet implements Set, while a TreeSet implements NavigableSet, which has extra functionality based around the concept of its elements (although there is no requirement for the implementation to actually order them).
A HashSet is faster (O(1) vs O(log n) than a TreeSet for all Set methods.
A TreeSet offers NavigableSet methods (eg ceiling()
that are O(log n), which are “faster” only because they don’t exist for Set, so it’s a no contest.
A TreeSet also iterates over its elements in Comparable order in O(n) time, whereas a HashSet just can’t do this; you’d have to iterate over the Set to collect the elements in a list, then sort the list - effective time complexity O(n log n).
Upvotes: 0
Reputation: 718678
TreeSet
is faster that HashSet
in some use-cases where ordering is relevant to the task that you are performing.
For example, if I have a set of strings and I want to find the smallest (according to the ordering) string in the set that is greater or equal to a given string.
HashSet
I have to iterate the entire set to find the string ... in the case where the given string is not in the set. That is O(N)
.TreeSet
that uses the required ordering, I can use ceiling
to find the required string in O(logN)
.Another example, if I want to iterate the strings set in order, that is O(N)
for a TreeSet
. For a HashSet
I have to extract the strings to an array, sort the array, and iterate that. All in all that is O(NlogN)
.
Caveats:
Complexity and performance are not the same thing. For instance, an O(N)
solution can be faster than an O(NlogN)
solution when N
is relatively small.
Java HashSet
operations are no longer O(1)
when the set size exceeds 231, because the standard HashSet
implementation uses a Java array as the hash array, and cannot resize beyond that.
Upvotes: 4
Reputation: 386
The added value of TreeSet is not the complexity but the type of the data structure. The complexity of hashset is better than that of treeset in any case, except in the case of an iteration, they have the same complexity.
HashSet: the add, remove, and contains methods has constant time complexity O(1).
TreeSet: the add, remove, and contains methods has time complexity of O(log (n)).
TreeSet offers several methods that hashset does not, for example to deal with the ordered set like first(), last(), headset(), tailset().
So to solve some problems TreeSet is more suitable so your program performance will be better than if you used HashSet.
Upvotes: 1
Reputation: 198014
Of the actual methods that are both defined on TreeSet
and HashSet
, none are reliably faster on TreeSet
. There are additional methods on TreeSet
that couldn't be implemented as efficiently on HashSet
, so they aren't -- methods such as floor
and ceiling
.
Upvotes: 1