Reputation: 253
I have a set S={a1,a2,a3,a4,a5,......,an}. The probability with which each of the element is selected is {p1,p2,p3,p4,p5,...,pn} respectively (where ofcourse p1+p2+p3+p4+p5+....+pn=1}.
I want to simulate an experiment which does that. However I wish to do that without any libraries (i.e from first principles)
I'm using the following method: 1) I map the elements on the real number line as follows X(a1)=1; X(a2)=2; X(a3)=3; X(a4)=4; X(a5)=5;....,X(an)=n
2) Then I calculate the cumulative probability distribution function for each coordinate (i.e P(x < X) as follows: cdf(x)= P(a1) + P(a2) + .....P(ai) such that X(ai) <= x < X(a(i+1))
(thus the cdf is a step function)
3) I randomly select an real number,q between (0,1). And calculate the x-coordinate where the line y = q intersects the cdf. Since the cdf is a step function with jumps at 1,2,...n the point would have an integer x-coordinate btw 1 and n. Let the x-coordinate be m.
4) I select that ai, such that X(ai) = m.
My question is does this method simulate the experiment without any bias?
I'm not getting the required results, which is why i'm a bit skeptical.
Any help will be greatly appreciated! Thanks!
Upvotes: 2
Views: 1428
Reputation: 5567
The logic sounds ok. Generally to sample an arbitrary distribution function Y(x) from a uniform distribution U(0,1), just lookup the uniform random value u in the Y vector and return the least value of x with Y(x) greater than or equal to u i.e. min{x:Y(x)>=u}
.
You may want to add an x=0 observation for the base probability as in the example below.
x P(x) Y(x)
0 0 0
1 0.1 0.1
2 0.3 0.4
3 0.4 0.8
4 0.2 1
eg u = 0.3 -> x = 2
, u = 0.81 -> x = 4
, etc.
Clearly calculating relative frequencies over many trials will give unbiased estimates of P(x).
Upvotes: 3