zzzbbx
zzzbbx

Reputation: 10161

random items from a list

this is more like a puzzle. I wanted to know if there is a way to choose k random items from a list of n items, given that n is unknown, and I only want to read the list of items once.

Thank you

Upvotes: 1

Views: 427

Answers (4)

zzzbbx
zzzbbx

Reputation: 10161

I guess the answer to my question is this:

pick first k elements and store them into an array of length k
for each element x > k
   insert x with probability k/x
   choose position at random between 1 and k

Upvotes: 2

codymanix
codymanix

Reputation: 29540

  1. Skip a random number of items from the current position in list
  2. Take the current item.
  3. If you've reached end of the list, jump to the start of the list and go to step 1
  4. Repeat these steps k times.

Upvotes: 0

kriss
kriss

Reputation: 24207

Easy (if k <=n). It is like getting a list of k numbers < n. This will be the list of positions of numbers to get. Create a list of range(0..n), get k random numbers from it. You won't have to read the actual list of items until the last moment. Obviously this is only useful is the final list of items is slow to read (it is read from disk or something like that).

To get the positions of items to pick just do:

import random
itemstopick = random.Random().sample(range(0,n), k)

If n, number of items is unknown, then you must start by picking the first k items (that is the solution if k = n). Then the only choice yu have is continue reading items and either choosing to keep the new item just read (and to remove another item) or to keep the current items as they are. To stick with a uniform probability you will have to decrease the probability to pick the last read item as you go on. Probability to keep the last item should always be P(k/n0) with n0 being the value of n at that time. I don't believe you can do better than that.

If you know some minorant of n (a value you can guarantee n is greater that it), just mix the two methods above. Start with a list created using minorant instead of n, then continue as for unknown n.

Upvotes: 1

Lu4
Lu4

Reputation: 15030

It depends on whether you have the random values generated or not, if you do, than it is possible, if not you will have to generate them, and you will need around from 2*k to 3*k operation in that case

Upvotes: 0

Related Questions