Reputation: 10161
this is more like a puzzle. I wanted to know if there is a way to choose k random items from a list of n items, given that n is unknown, and I only want to read the list of items once.
Thank you
Upvotes: 1
Views: 427
Reputation: 10161
I guess the answer to my question is this:
pick first k elements and store them into an array of length k
for each element x > k
insert x with probability k/x
choose position at random between 1 and k
Upvotes: 2
Reputation: 29540
Upvotes: 0
Reputation: 24207
Easy (if k <=n). It is like getting a list of k numbers < n. This will be the list of positions of numbers to get. Create a list of range(0..n), get k random numbers from it. You won't have to read the actual list of items until the last moment. Obviously this is only useful is the final list of items is slow to read (it is read from disk or something like that).
To get the positions of items to pick just do:
import random
itemstopick = random.Random().sample(range(0,n), k)
If n, number of items is unknown, then you must start by picking the first k items (that is the solution if k = n). Then the only choice yu have is continue reading items and either choosing to keep the new item just read (and to remove another item) or to keep the current items as they are. To stick with a uniform probability you will have to decrease the probability to pick the last read item as you go on. Probability to keep the last item should always be P(k/n0) with n0 being the value of n at that time. I don't believe you can do better than that.
If you know some minorant of n (a value you can guarantee n is greater that it), just mix the two methods above. Start with a list created using minorant instead of n, then continue as for unknown n.
Upvotes: 1
Reputation: 15030
It depends on whether you have the random values generated or not, if you do, than it is possible, if not you will have to generate them, and you will need around from 2*k to 3*k operation in that case
Upvotes: 0