dromodel
dromodel

Reputation: 10203

How to efficiently pick all items one at a time randomly with weights from a collection?

I've got a collection of objects that I'd like to examine one at a time until all objects have been examined. Each object is selected by a predefined weight with many likely duplicates. The end result will be an ordered list of items from the collection. What are efficient ways to get this list?

Consider, for example, the following balls with specified volumes:

A: 2
B: 3
C: 25
D: 100

Let's add 4 A balls, 3 B balls, 1 C ball, and 2 D balls to a bag. Supposing that the probability of drawing a particular ball is proportional to its volume then the probability of drawing a particular D ball at this point is 100/242 (they have the same weight but aren't identical). Suppose that this D was drawn and continue. The odds of drawing a C at this point are 25/142, since the D ball was removed previously. Suppose that you drew the C ball here and continue. Continue drawing until all the balls have been removed so that you have a sequence like DCDBABBA.

Upvotes: 0

Views: 91

Answers (1)

j_random_hacker
j_random_hacker

Reputation: 51226

[EDIT: Updated to add individual ball numbers]

Suppose there are n balls of k different types. Create a k-element array of (balltype, weight, count) triples representing the initial state. As you do this, add each weight[i] * count[i] to total, which starts out at 0.

First, set up an array of ball numbers for each ball type:

  1. For i from 1 to k:
    • Create a count[i]-element array b[i], assigning j to b[i][j] for 1 <= j <= count[i].

Now randomly pick a ball. The following steps can be repeated n times to pick all balls in some random order:

  1. Choose a random integer r between 0 and total - 1, inclusive.
  2. Set p = 0.
  3. For i from 1 to k:
    • Add weight[i] * count[i] to p.
    • If r < p:
      • We have picked a ball of type balltype[i]. Output it.
      • Choose a random integer c between 1 and count[i], inclusive.
      • Output b[i][c] as the ball number.
      • Decrease count[i] by 1.
      • Set b[i][c] = b[i][count[i]]. This keeps the unused ball numbers "dense".
      • Set total = total - weight[i].
      • Stop.

To pick out all n balls will take O(nk) time. This can be sped up by roughly a factor of 2 by moving the last entry in the array of triples to position i whenever count[i] reaches 0 (i.e. when all balls of type i have been used up) and decreasing n by 1, however for the code that chooses ball numbers to continue working, either the entire array b[n] must also be copied to b[i], or another layer of indirection must be used.

Upvotes: 1

Related Questions