Reputation: 1218
I'm trying to write a function which draws randomly elements from a collection and adds them to a new one. So if you want to draw 3 elements from {1,2,3,4,5} you could get {5,3,4}. I came up with this generic function:
/**
* Take a random sample without repeats of the specified size from a
* collection. Note that if the collection you're sampling from contains
* repeated elements, then the sample could also contain repeated elements.
* Use a Set as an argument to avoid this.
*
* @param <T> The type of objects in the Collection
* @param <E> The type of the collection
* @param collection The collection
* @param size The sample size of elements which you wish to extract from
* the collection
* @param factory A factory method for the collection E. Call with
* "new::ArrayList" or something similar.
* @return A random sample of the collection consisting of 'size' elements
* without repeats (unless the original collection contained repeats).
* @throws IllegalArgumentException if size is larger than the collection.size().
*/
public static <T, E extends Collection<T>> E drawRandomlyWithoutReplacement(List<T> collection, int size, Supplier<E> factory) {
if (size > collection.size()) {
throw new IllegalArgumentException("The sample size cannot be greater than the size of the collection.");
}
E list = factory.get();
for (int i = 0; i < size; i++) {
int r = MathUtils.randomInt(0, collection.size() - 1);
list.add(collection.remove(r));
}
return list;
}
Unfortunately the Collection interface does not have a function which returns an element from the collection if you remove it, but List and Vector (among others) do have it. Is there a way I can make this function work for Lists and Vectors without having to overload it 3 times? I tried making the first argument being of type C
where C extends List<T> | Vector<T>
but unfortunately this didn't work.
Upvotes: 3
Views: 117
Reputation: 11740
The Collection
interface can't do a remove but its Iterator
.
public static <T, E extends Collection<T>> E drawRandomlyWithoutReplacement(Collection<T> collection, int size,
Supplier<E> factory) {
final int colSize = collection.size();
if (size > colSize) {
throw new IllegalArgumentException("The sample size cannot be greater than the size of the collection.");
} else if(size == 0) {
return factory.get();
}
Random rand = new Random();
Set<Integer> sampleIndices = new TreeSet<>();
while (sampleIndices.size() < size) {
sampleIndices.add(rand.nextInt(colSize));
}
E result = factory.get();
Iterator<T> collectionIterator = collection.iterator();
Iterator<Integer> indexIterator = sampleIndices.iterator();
int sampleIndex = indexIterator.next();
for (int i = 0; i < colSize; i++) {
T sample = collectionIterator.next();
if (i == sampleIndex) {
result.add(sample);
collectionIterator.remove();
if (indexIterator.hasNext()) {
sampleIndex = indexIterator.next();
} else {
break;
}
}
}
return result;
}
Upvotes: 3
Reputation: 23562
Let the collection
be any collection (Collection<T> collection
).
E list = factory.get();
List<T> colAsList = new ArrayList<>(collection);
for (int i = 0; i < size; i++) {
int r = MathUtils.randomInt(0, colAsList.size() - 1);
list.add(colAsList.remove(r));
}
collection.removeAll(list);
return list;
Upvotes: 1
Reputation: 472
For collections of unknown size or streams of elements (in your case an iterator) there is a family of algorithms with the exact purpouse that you need called reservoir sampling.
Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number.
The following algorithm runs in linear time and is guaranteed to pick numbers with the needed probability.
public static <T> List<T> reservoirSample(Iterable<T> items, int m){
ArrayList<T> res = new ArrayList<T>(m);
int count = 0;
for(T item : items){
count++;
if (count <= m)
res.add(item);
else{
int r = rnd.nextInt(count);
if (r < m)
res.set(r, item);
}
}
return res;
}
This sample is taken from this blogpost, where you can further read about the topic.
Upvotes: 0
Reputation: 76
How about List, ArrayList, Vector are the subclasses and it defines
E remove(int index) //removing and returning
See http://docs.oracle.com/javase/7/docs/api/java/util/List.html
Thanks Marco for pointing it out, i initially suggested AbstractList
Upvotes: 1