Reputation: 169

Performance (runtime) of pulling arbitrary element from HashSet

I am using HashSets in an implementation to have fast adding, removing and element testing (amortized constant time). However, I'd also like a method to obtain an arbitraty element from that set. The only way I am aware of is

Object arbitraryElement = set.iterator.next();

My question is - how fast (asymptotically speaking) is this? Does this work in (not amortized) constant time in the size of the set, or does the iterator().next() method do some operations that are slower? I ask because I seem to lose a log-factor in my implementation as experiments show, and this is one of the few lines affected.

Thank you very much!

Upvotes: 2

Answers (7)

augurar

Reputation: 13086

If you are repeatedly choosing an arbitrary set element using an iterator and often removing that element, this can lead to a situation where the internal representation becomes unbalanced and finding the first element degrades to linear time complexity.

This is actually a pretty common occurrence when implementing algorithms involving graph traversal.

Use a LinkedHashSet to avoid this problem.

Demonstration:

import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.Random;
import java.util.Set;
import java.util.function.Supplier;
import java.util.stream.Collectors;

public class SetPeek {

    private static final Random rng = new Random();

    private static <T> T peek(final Iterable<T> i) {
        return i.iterator().next();
    }

    private static long testPeek(Set<Integer> items) {
        final long t0 = System.currentTimeMillis();
        for (int i = 0; i < 100000; i++) {
            peek(items);
        }
        final long t1 = System.currentTimeMillis();
        return t1 - t0;
    }

    private static <S extends Set<Integer>> S createSet(Supplier<S> factory) {
        final S set = new Random().ints(100000).boxed()
            .collect(Collectors.toCollection(factory));

        // Remove first half of elements according to internal iteration
        // order.  With the default load factor of 0.75 this will not trigger
        // a rebalancing.
        final Iterator<Integer> it = set.iterator();
        for (int k = 0; k < 50000; k++) {
            it.next();
            it.remove();
        }

        return set;
    }

    public static void main(String[] args) {
        final long hs = testPeek(createSet(HashSet::new));
        System.err.println("HashSet: " + hs + " ms");
        final long lhs = testPeek(createSet(LinkedHashSet::new));
        System.err.println("LinkedHashSet: " + lhs + " ms");
    }
}

Results:

HashSet: 6893 ms
LinkedHashSet: 8 ms

Upvotes: 0

user1111929

Reputation: 6099

If you need an arbitrary element in the probabilistic sense, you could use the following approach.

class MySet<A> {
     ArrayList<A> contents = new ArrayList();
     HashMap<A,Integer> indices = new HashMap<A,Integer>();
     Random R = new Random();

     //selects random element in constant O(1) time
     A randomKey() {
         return contents.get(R.nextInt(contents.size()));
     }

     //adds new element in constant O(1) time
     void add(A a) {
         indices.put(a,contents.size());
         contents.add(a);
     }

     //removes element in constant O(1) time
     void remove(A a) {
         int index = indices.get(a);
         contents.set(index,contents.get(contents.size()-1));
         contents.remove(contents.size()-1);
         indices.set(contents.get(contents.size()-1),index);
         indices.remove(a);
     }

     //all other operations (contains(), ...) are those from indices.keySet()
}

Upvotes: 0

oconnor0

Reputation: 3264

HashSet.iterator().next() linearly scans the table to find the next contained item.

For the default load factor of .75, you would have three full slots for every empty one.

There is, of course, no guarantee what the distribution of the objects in the backing array will be & the set will never actually be that full so scans will take longer.

I think you'd get amortized constant time.

Edit: The iterator does not create a deep copy of anything in the set. It only references the array in the HashSet. Your example creates a few objects, but nothing more & no big copies.

Upvotes: 4

Devon_C_Miller

Reputation: 16528

This is from the JDK 7 JavaDoc for HashSet:

Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

I looked at the JDK 7 implementation of HashSet and LinkedHashSet. For the former, the next operation is a linked-list traversal within a has bucket, and between buckets an array traversal, where the size of the array is given by capacity(). The latter is strictly a linked list traversal.

Upvotes: 0

mikera

Reputation: 106401

Getting the first element out of a HashSet using the iterator is pretty fast: I think it's amortised O(1) in most cases. This assumes the HashSet is reasonably well-populated for it's given capacity - if the capacity is very large compared to the number of elements then it will be more like O(capacity/n) which is the average number of buckets the iterator needs to scan before finding a value.

Even scanning an entire HashSet with an iterator is only O(n+capacity) which is effectively O(n) if your capacity is appropriately scaled. So it's still not particularly expensive (unless your HashSet is very large)

If you want better than that , you'll need a different data structure.

If you really need the fast access of arbitrary elements by index then I'd personally just put the objects in an ArrayList which will give you very fast O(1) access by index. You can then generate the index as a random number if you want to select an arbitrary element with equal probability.

Alternatively, if you want to get an arbitrary element but don't care about indexed access then a LinkedHashSet may be a good alternative.

Upvotes: 1

Louis Wasserman

Reputation: 198471

I wouldn't expect this to be a logarithmic factor, on average, but it might be slow in some rare cases. If you care about this, use LinkedHashSet, which will guarantee constant time.

Upvotes: 2

hvgotcodes

Reputation: 120318

I would maintain an ArrayList of your keys, and when you need a random object, just generate an index, grab the key, and pull it out of the set. O(1) baby...

Upvotes: 1

Performance (runtime) of pulling arbitrary element from HashSet

Answers (7)

Related Questions