Sven
Sven

Reputation: 933

Get a random element from a sequential collection

I talk to an API that gives me an java.util.Iterator over a collection. That means I can iterate over it but I can't get direct/random access to the elements.

Now to my problem: I want to get one random element from this collection. How do i do that? I guess I could build a new collection that allows direct access, but isn't that a little memory consuming? I could also iterate over the entire collection and for each element "roll a dice" to see if I should take that element and quit iteration or continue. But then I need the size of the collection and I can't get that from the Iterator.

Thanks in advance.

Upvotes: 11

Views: 6825

Answers (6)

Jablub
Jablub

Reputation: 11

Used this for generating weighted test data. it's not efficient but is easy

class ProbabilitySet<E> {

    Set<Option<E>> options =  new HashSet<Option<E>>(); 

    class Option<E> {
        E object;
        double min;
        double max;

        private Option(E object, double prob) {
            this.object = object;
            min = totalProb;
            max = totalProb + prob;
        }

        @Override
        public String toString() {
            return "Option [object=" + object + ", min=" + min + ", max=" + max + "]";
        }
    }

    double totalProb = 0;
    Random rnd = new Random();

    public void add(E object, double probability){
        Option<E> tuple = new Option<E>(object, probability);
        options.add(tuple);
        totalProb += probability;
    }

    public E getRandomElement(){

        double no = rnd.nextDouble() * totalProb;
        for (Option<E> tuple : options) {
            if (no >= tuple.min && no < tuple.max){
                return tuple.object;
            }
        }


        return null;  // if this happens sumfink is wrong.

    }

    @Override
    public String toString() {
        return "ProbabilitySet [options=" + options + ", totalProb=" + totalProb + "]";
    }

}

NOTE: the probability parameters will be relative to the total not to 1.0

Usage:

public static void main(String[] args) {
    ProbabilitySet<String> stati = new ProbabilitySet<String>();
    stati.add("TIMEOUT", 0.2);
    stati.add("FAILED", 0.2);
    stati.add("SUCCESSFUL", 1.0);

    for (int i = 0; i < 100; i++) {
        System.out.println(stati.getRandomElement());
    }

}

Upvotes: 1

Dean Povey
Dean Povey

Reputation: 9446

If you really don't have any random access, and you have a very large list so you can't copy it then you can do the following:

int n = 2
iterator i = ...
Random rand = new Random();
Object candidate = i.next();
while (i.hasNext()) {
    if (rand.nextInt(n)) {
        candidate = i.next();
    } else {
        i.next();
    }
    n++;
}
return candidate;

This will retain a random element from a list, but requires you traverse the whole list. If you want a truly uniformly distributed value you have no choice but to do this.

Alternatively, if the number of items is small, or if you want a random permutation of a list of unknown size (in other words you want to access all the elements of the list in a random order), then I recommend copying all the references to a new list (this will not be a significant amount of memory unless you have millions of items since you are only storing references). Then either use get with a random integer or use the standard java.util.Collections shuffle method to permute the list.

Upvotes: 0

Bill the Lizard
Bill the Lizard

Reputation: 405725

There's a way to do it on one pass through the collection that doesn't use a lot of extra memory (just the size of one element of the collection plus a float). In pseudocode:

  • Iterate through the collection.
  • For each item, generate a random float.
  • If the float is the lowest (or highest, it doesn't matter) one you've seen so far, store the current item from the collection in a temporary variable. (Also store the new lowest random value.)
  • Once you reach the end of the collection, you have a random item in the temp variable.

Obviously this has the drawback of iterating through the entire collection every time you call it, but you don't have a lot of choice with the constraints you're facing.

Update: The name of this type of problem finally came back to me. This is called Reservoir sampling.

Upvotes: 10

Tom Hawtin - tackline
Tom Hawtin - tackline

Reputation: 147144

When iteration, you know how many objects you've iterated through, so you know the probability that the current element is the one to select randomly. So you just need to keep hold of a count and the current randomly selected item.

public static <T> T selectRandom(final Iterator<T> iter, final Random random) {
    if (!iter.hasNext()) {
        throw new IllegalArgumentException();
    }
    if (random == null) {
        throw new NullPointerException();
    }
    T selected = iter.next();
    int count = 1;
    while (iter.hasNext()) {
        final T current = iter.next();
        ++count;
        if (random.nextInt(count) == 0) {
            selected = current;
        }
    }
    return selected;
}

(Stack Overflow Disclaimer: Not compiled, and certainly not tested.)

See also the section about Collections.shuffle in Java Puzzlers.

Upvotes: 7

MRalwasser
MRalwasser

Reputation: 15953

The only safe solution (in case no further information is known/guaranteed) is the way you described: Create a List from the Iterator and pick a random element.

If the size of the underlying collection is always the same you can reduce the effort by a half in an average - just use the element you got after Iterator.next() after a random number of iterations.

BTW: Are you really using a Collection which implements java.util.Iterator?

Upvotes: 2

Bence Olah
Bence Olah

Reputation: 684

It depends on the requirements, if the size of the collection is not huge then this will do it, otherwise you should you iterate and use the dice method you mentioned

List<Object> list = Arrays.asList(yourCollection.toArray(new Object[0]));
result = list.get(new Random().nextInt(list.size()));

Upvotes: 1

Related Questions