Adwaitvedant
Adwaitvedant

Reputation: 253

Sampling from discrete probability distribution from first principles

I have a set S={a1,a2,a3,a4,a5,......,an}. The probability with which each of the element is selected is {p1,p2,p3,p4,p5,...,pn} respectively (where ofcourse p1+p2+p3+p4+p5+....+pn=1}.

I want to simulate an experiment which does that. However I wish to do that without any libraries (i.e from first principles)

I'm using the following method: 1) I map the elements on the real number line as follows X(a1)=1; X(a2)=2; X(a3)=3; X(a4)=4; X(a5)=5;....,X(an)=n

2) Then I calculate the cumulative probability distribution function for each coordinate (i.e P(x < X) as follows: cdf(x)= P(a1) + P(a2) + .....P(ai) such that X(ai) <= x < X(a(i+1))

(thus the cdf is a step function)

3) I randomly select an real number,q between (0,1). And calculate the x-coordinate where the line y = q intersects the cdf. Since the cdf is a step function with jumps at 1,2,...n the point would have an integer x-coordinate btw 1 and n. Let the x-coordinate be m.

4) I select that ai, such that X(ai) = m.

My question is does this method simulate the experiment without any bias?

I'm not getting the required results, which is why i'm a bit skeptical.

Any help will be greatly appreciated! Thanks!

Upvotes: 2

Views: 1428

Answers (1)

lori_m
lori_m

Reputation: 5567

The logic sounds ok. Generally to sample an arbitrary distribution function Y(x) from a uniform distribution U(0,1), just lookup the uniform random value u in the Y vector and return the least value of x with Y(x) greater than or equal to u i.e. min{x:Y(x)>=u}.

You may want to add an x=0 observation for the base probability as in the example below.

x      P(x)    Y(x)
0      0       0
1      0.1     0.1
2      0.3     0.4
3      0.4     0.8
4      0.2     1

eg u = 0.3 -> x = 2, u = 0.81 -> x = 4, etc.

Clearly calculating relative frequencies over many trials will give unbiased estimates of P(x).

Upvotes: 3

Related Questions