Reputation: 5266
Suppose we have a set x
of N
values {x_i; i=1,...,N}
and a set of some associated probabilities {w_i; i=1,...,N}
.
We want to get from the set x
, a new set x^
of N
values {x^_i; i=1,...,N}
by choosing each value x_i
from the setx
according to the probability w_i
. How do we code that (i.e. a pseudo code algorithm, that can be translated to any language).
EDIT: python code:
def resample(self,x,w):
N = len(w)
new_x = empty(N)
c = cumsum(w)
for i in range(N):
r = random()
for j in range(N):
if( j == N-1 ):
new_x[i] = x[j]
break
else:
if( (c[j] <= r) and (r < c[j+1]) ):
new_x[i] = x[j+1]
break
new_w = ones(N,dtype=float)/N
return new_x, new_w
Upvotes: 0
Views: 207
Reputation: 113
# import the random library
import random
# define the input data
X = ["A","B","C","D"]
w = [0.2,0.3,0.4,0.1]
# size of the new sample
n = 10
# empy list to store the result
Xp = []
# the actual code
while len(Xp) < n:
random_choice = random.choice(w)
if random_choice >= random.random():
Xp.append(X[w.index(random_choice)])
# have a look
Xp
Out[39]:
['C', 'C', 'C', 'B', 'D', 'B', 'A', 'D', 'A', 'B']
Upvotes: 0
Reputation: 826
I think the best option is preprocessing the probability set and then getting a random value.
Let me explain what I mean:
First you create a new set, for example h_i in which you place the accumulated probability of each object.
x_i:{A,B,C,D}
w_i:{0.2,0.3,0.4,0.1}
h_i:{0.2,0.5,0.9,1}
The last element is of course 1. (but if it is not (you have missing cases) it still works.
Now you generate a random number 0≤r≤1 and lookup the first element whose h is greater than r.
For example if you get 0.56 you choose C because 0.9(h_C) > 0.56 and 0.5(h_B) ≤ 0.56
This operation can be expensive on arrays but if you choose a binary search tree for the storage of the set h_i you can get very good results.
That is if you want to choose lots of random values over the same probability set. If the set is constantly changing I would use another approach.
Upvotes: 1
Reputation: 346
You can call a function which gives you a random number between 0 and 1.
If the probabilities are w_1 = 0.2, w_2 = 0.5, w_3 = 0.3, you can:
Choose x_1 if you got a number between 0 and 0.2
Choose x_2 if you got a number between 0.2 and 0.7
Choose x_3 otherwise.
More generally, choose x_n if w_1 + ... + w_(n-1) <= random number < w_1 + ... + w_(n-1) + w_n
That's not the whole pseudocode, just an explanation of its most problematic part, but I think it should be enough if you have a basic understanding of your problem.
Upvotes: 4