user1111929
user1111929

Reputation: 6099

TreeSet vs HashSet speed for small set size, when O(1) vs O(log n) doesn't matter

I've been reading up on complexity of HashSet vs TreeSet, and everywhere I find topics explaining: "HashSet is faster because it's O(1) instead of O(log n)." Of course, I know this.

However, that only holds if you're working with very large sets. I, on the other hand, need to work with millions of "small" sets, each containing at most 200 objects, and most far less (below 20). The operations on them are very diverse (creating, adding, removing, membership testing, cloning, ...) and hence I'm puzzled on how to best measure/simulate the difference.

Which of both classes will have the least overhead for such small set sizes? Both in terms of speed and of memory overhead. And what about LinkedHashSet?

Upvotes: 3

Views: 1560

Answers (2)

M.P. Korstanje
M.P. Korstanje

Reputation: 12019

There is no clear cut answer to this question because it all depends. So this will just be a scatter shot of comments.

  1. Hashset is much faster then a tree set. Even for small sets.
  2. If computing your hash and equals is very expensive consider wrapping your items in a class that only uses the specific identifying information for your use case. If items are immutable and reused often between hashsets consider caching the hash.
  3. Use a profiler to determine which solution works best on real data.

Upvotes: 1

the8472
the8472

Reputation: 43042

and hence I'm puzzled on how to best measure/simulate the difference.

Use a profiler. If the Set operations don't dominate the results (CPU time, memory footprint, allocation rates) then your choice will not make a difference in practice due to amdahl's law..

The biggest advantage of the TreeSet is ordering.

And neither implementation is particularly memory-efficient, there are better Sets out there, depending on which performance measure you care most about. They are wrappers around the corresponding Map implementations and the Maps themselves are not particularly efficient either.

They're more designed for flexibility, providing the vast set of APIs they do than optimizing any particular performance aspect.

Upvotes: 1

Related Questions