Reputation: 10203
I've got a collection of objects that I'd like to examine one at a time until all objects have been examined. Each object is selected by a predefined weight with many likely duplicates. The end result will be an ordered list of items from the collection. What are efficient ways to get this list?
Consider, for example, the following balls with specified volumes:
A: 2
B: 3
C: 25
D: 100
Let's add 4 A balls, 3 B balls, 1 C ball, and 2 D balls to a bag. Supposing that the probability of drawing a particular ball is proportional to its volume then the probability of drawing a particular D ball at this point is 100/242 (they have the same weight but aren't identical). Suppose that this D was drawn and continue. The odds of drawing a C at this point are 25/142, since the D ball was removed previously. Suppose that you drew the C ball here and continue. Continue drawing until all the balls have been removed so that you have a sequence like DCDBABBA.
Upvotes: 0
Views: 91
Reputation: 51226
[EDIT: Updated to add individual ball numbers]
Suppose there are n
balls of k
different types. Create a k
-element array of (balltype, weight, count)
triples representing the initial state. As you do this, add each weight[i] * count[i]
to total
, which starts out at 0.
First, set up an array of ball numbers for each ball type:
i
from 1 to k
:
count[i]
-element array b[i]
, assigning j
to b[i][j]
for 1 <= j <= count[i]
.Now randomly pick a ball. The following steps can be repeated n
times to pick all balls in some random order:
r
between 0 and total
- 1, inclusive.p
= 0.i
from 1 to k
:
weight[i] * count[i]
to p
.r < p
:
balltype[i]
. Output it.c
between 1 and count[i]
, inclusive.b[i][c]
as the ball number.count[i]
by 1.b[i][c] = b[i][count[i]]
. This keeps the unused ball numbers "dense".total = total - weight[i]
.To pick out all n
balls will take O(nk
) time. This can be sped up by roughly a factor of 2 by moving the last entry in the array of triples to position i
whenever count[i]
reaches 0 (i.e. when all balls of type i
have been used up) and decreasing n
by 1, however for the code that chooses ball numbers to continue working, either the entire array b[n]
must also be copied to b[i]
, or another layer of indirection must be used.
Upvotes: 1