Making a generic function work with multiple non-overlapping types

I'm trying to write a function which draws randomly elements from a collection and adds them to a new one. So if you want to draw 3 elements from {1,2,3,4,5} you could get {5,3,4}. I came up with this generic function:

/**
 * Take a random sample without repeats of the specified size from a
 * collection. Note that if the collection you're sampling from contains
 * repeated elements, then the sample could also contain repeated elements.
 * Use a Set as an argument to avoid this.
 *
 * @param <T> The type of objects in the Collection
 * @param <E> The type of the collection
 * @param collection The collection
 * @param size The sample size of elements which you wish to extract from
 * the collection
 * @param factory A factory method for the collection E. Call with
 * "new::ArrayList" or something similar.
 * @return A random sample of the collection consisting of 'size' elements
 * without repeats (unless the original collection contained repeats).
 * @throws IllegalArgumentException if size is larger than the collection.size().
 */
public static <T, E extends Collection<T>> E drawRandomlyWithoutReplacement(List<T> collection, int size, Supplier<E> factory) {
    if (size > collection.size()) {
        throw new IllegalArgumentException("The sample size cannot be greater than the size of the collection.");
    }

    E list = factory.get();
    for (int i = 0; i < size; i++) {
        int r = MathUtils.randomInt(0, collection.size() - 1);
        list.add(collection.remove(r));
    }
    return list;
}

Unfortunately the Collection interface does not have a function which returns an element from the collection if you remove it, but List and Vector (among others) do have it. Is there a way I can make this function work for Lists and Vectors without having to overload it 3 times? I tried making the first argument being of type C where C extends List<T> | Vector<T> but unfortunately this didn't work.

Upvotes: 3

Views: 117

Answers (4)

Flown
Flown

Reputation: 11740

The Collection interface can't do a remove but its Iterator.

public static <T, E extends Collection<T>> E drawRandomlyWithoutReplacement(Collection<T> collection, int size,
    Supplier<E> factory) {
  final int colSize = collection.size();
  if (size > colSize) {
    throw new IllegalArgumentException("The sample size cannot be greater than the size of the collection.");
  } else if(size == 0) {
    return factory.get();
  }

  Random rand = new Random();
  Set<Integer> sampleIndices = new TreeSet<>();
  while (sampleIndices.size() < size) {
    sampleIndices.add(rand.nextInt(colSize));
  }

  E result = factory.get();

  Iterator<T> collectionIterator = collection.iterator();
  Iterator<Integer> indexIterator = sampleIndices.iterator();
  int sampleIndex = indexIterator.next();
  for (int i = 0; i < colSize; i++) {
    T sample = collectionIterator.next();
    if (i == sampleIndex) {
      result.add(sample);
      collectionIterator.remove();
      if (indexIterator.hasNext()) {
        sampleIndex = indexIterator.next();
      } else {
        break;
      }
    }
  }
  return result;
}

Upvotes: 3

Dragan Bozanovic
Dragan Bozanovic

Reputation: 23562

Let the collection be any collection (Collection<T> collection).

E list = factory.get();
List<T> colAsList = new ArrayList<>(collection);
for (int i = 0; i < size; i++) {
    int r = MathUtils.randomInt(0, colAsList.size() - 1);
    list.add(colAsList.remove(r));
}
collection.removeAll(list);
return list;

Upvotes: 1

Milan Milanov
Milan Milanov

Reputation: 472

For collections of unknown size or streams of elements (in your case an iterator) there is a family of algorithms with the exact purpouse that you need called reservoir sampling.

Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number.

The following algorithm runs in linear time and is guaranteed to pick numbers with the needed probability.

public static <T> List<T> reservoirSample(Iterable<T> items, int m){
    ArrayList<T> res = new ArrayList<T>(m);
    int count = 0;
    for(T item : items){
        count++;
        if (count <= m)
            res.add(item);
        else{
            int r = rnd.nextInt(count);
            if (r < m)
                res.set(r, item);
        }
    }
    return res;
}

This sample is taken from this blogpost, where you can further read about the topic.

Upvotes: 0

whitecoffee
whitecoffee

Reputation: 76

How about List, ArrayList, Vector are the subclasses and it defines

E remove(int index) //removing and returning

See http://docs.oracle.com/javase/7/docs/api/java/util/List.html

Thanks Marco for pointing it out, i initially suggested AbstractList

Upvotes: 1

Related Questions