George
George

Reputation: 120

Java HashSet worst case lookup time complexity

If hashtables/maps with closed hashing are worst-case O(n), are HashSets also going to require O(n) time for lookup, or is it constant time?

Upvotes: 2

Views: 2603

Answers (4)

alexcornejo
alexcornejo

Reputation: 163

I see a lot of people saying the worst case is O(n). This is because the old HashSet implementation used to use a LinkedList to handle collisions to the same bucket. However, that is not a definitive answer.

In java 8 such LinkedList is replaced by a balanced binary tree when the number of collisions of a bucket grows. This improves the worst-case performance from O(n) to O(log n) for lookups.

You can check additional details here.

Upvotes: 0

anthony_718
anthony_718

Reputation: 474

Worst case is O(N) as mentioned, average and amortized run time is constant.

From GeeksForGeeks: The underlying data structure for HashSet is hashtable. So amortize (average or usual case) time complexity for add, remove and look-up (contains method) operation of HashSet takes O(1) time.

Upvotes: 0

Brent Writes Code
Brent Writes Code

Reputation: 19623

If you look at the implementation of a HashSet (e.g. from OpenJDK 8: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashSet.java), you can see that it's actually just built on top of a HashMap. Relevant code snippet here:

public class HashSet<E>

    extends AbstractSet<E>

    implements Set<E>, Cloneable, java.io.Serializable

{
    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map

    private static final Object PRESENT = new Object();

    /**

     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has

     * default initial capacity (16) and load factor (0.75).

     */

    public HashSet() {

        map = new HashMap<>();

    }

    public boolean add(E e) {

        return map.put(e, PRESENT)==null;

    }

The HashSet attempts to slightly optimize the memory usage by creating a single static empty Object value named PRESENT and just using that as the value part of every key/value entry into the HashMap.

So whatever the performance implications are of using a HashMap, a HashSet will have more or less the same ones since it's literally using a HashMap under the covers.

To directly answer your question: in the worst case, yes, just as a the worse case complexity of a HashMap is O(n), so too the worst case complexity of a HashSet is O(n).

It is worth noting that, unless you have a really bad hash function or are using a hashtable of a ridiculously small size, you're very unlikely to see the worst case performance in practice. You'd have to have every element hash to the exact same bucket in the hashtable so the performance would essentially degrade to a linked list traversal (assuming a hashtable using chaining for collision handling, which the Java ones do).

Upvotes: 1

Mureinik
Mureinik

Reputation: 312219

When looking up an element in a HashMap, it performs an O(1) calculation to find the right bucket, and then iterates over the items there serially until it finds the one the is equal to the requested key, or all the items are checked.

In the worst case scenario, all the items in the map have the same hash code and are therefore stored in the same bucket. In this case, you'll need to iterate over all of them serially, which would be an O(n) operation.

A HashSet is just a HashMap where you don't care about the values, only the keys - under the hood, it's a HashMap where all the values are a dummy Object.

Upvotes: 5

Related Questions