Reputation: 12915
I found this post:
Efficiently selecting a set of random elements from a linked list
But this means that in order to approach true randomness in the sample I have to iterate over all elements, throw them in memory with a random number, and then sort. I have a very large set of items here (millions) - is there a more efficient approach to this problem?
Upvotes: 3
Views: 2595
Reputation: 1500903
I would suggest simply shuffling elements as if you were writing a modified Fisher-Yates shuffle, but only bother shuffling the first k
elements. For example:
public static void PartialShuffle<T>(IList<T> source, int count, Random random)
{
for (int i = 0; i < count; i++)
{
// Pick a random element out of the remaining elements,
// and swap it into place.
int index = i + random.Next(source.Count - i);
T tmp = source[index];
source[index] = source[i];
source[i] = tmp;
}
}
After calling this method, the first count
elements will be randomly picked elements from the original list.
Note that I've specified the Random
as a parameter, so that you can use the same one repeatedly. Be careful about threading though - see my article on randomness for more information.
Upvotes: 11
Reputation: 112402
If the elements can be in memory, put them in memory first
List<Element> elements = dbContext.Select<Element>();
Now you know the number of elements. Create a set of unique indexes.
var random = new Random();
var indexes = new HashSet<int>();
while (indexes.Count < k) {
indexes.Add(random.Next(elements.Count));
}
Now you can read the elements from the list
var randomElements = indexes.Select(i => elements[i]);
I assume that the DB contains unique elements. If this is not the case, you will have to create a HashSet<Elements>
instead or to append .Distinct()
when querying from the DB.
UPDATE
As Patricia Shanahan says, this method will work well if k is small compared to n. If it is not the case, I suggest selecting a set n - k indexes to be excluded
var random = new Random();
var indexes = new HashSet<int>();
IEnumerable<Element> randomElements;
if (k <= elements.Count / 2) {
while (indexes.Count < k) {
indexes.Add(random.Next(elements.Count));
}
randomElements = indexes.Select(i => elements[i]);
} else {
while (indexes.Count < elements.Count - k) {
indexes.Add(random.Next(elements.Count));
}
randomElements = elements
.Select((e,i) => indexes.Contains(i) ? null : elements[i])
.Where(e => e != null);
}
Upvotes: 3
Reputation: 6373
Take a look at this extension method http://extensionmethod.net/csharp/ienumerable-t/shuffle. You could add Skip() Take() type to page the values out the final list.
Upvotes: 3