Reputation: 83

Generating a random number with weighted probability - 'Distribution' gem

I would like to create a random number generator, that generates a random decimal number:

Greater than 0.0
Less than 15.0
Where the probability of that number being close to 2.0 is relatively high
The probability of it being near 15.0 or very close to zero is very low

I'm terrifically poor at mathematics but my research seems to tell me I want to pull a random number from a Cumulative Distribution Function resembling a Fisher–Snedecor (F) pattern, a bit like this one:

http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6303a2314437d8fcf2f72d9a56b1293a/f_distribution_probability.png

I am using a Ruby gem called Distribution (https://github.com/sciruby/distribution) to try and achieve this. It looks like the right tool, but I'm having a terrible time trying to understand how to use it to achieve the desired outcome :( Any help please.

Upvotes: 3

Answers (3)

Cary Swoveland

Reputation: 110675

Sometimes you know which distribution applies because of the nature of the data. If, for example, the random variable is the sum of independent, identical Bernouli (two-state) random variables, you know the former has a binomial distribution, which can be approximated by a Normal distribution. When, as here, that does not apply, you can use a continuous distribution, shaped by it's parameters, or simply use a discrete distribution. Others have made suggestions for using various continuous distributions, so I'll pass on some remarks about using a discrete distribution.

Suppose the discrete probability density function were the following:

pdf = [[0.5, 0.03], [1.0, 0.06], [1.5, 0.10], [ 2.0, 0.15], [2.5 , 0.15], [ 3.0, 0.10],
       [4.0, 0.11], [6.0, 0.14], [9.0, 0.10], [12.0, 0.03], [14.0, 0.02], [15.0, 0.01]] 


pdf.map(&:last).reduce(:+)
  #=> 1.0

This could be interpreted as there being a probability of 0.03 that the random variable will be less than 0.5, a 0.06 probability of the random variable being greater than or equal 0.5 and less than 1.0, and so on.

A discrete pdf might be constructed from historical data or by sampling, an advantage it has over using a continuous distribution. It can be made arbitrarily fine by increasing the numbers of intervals.

Next convert the pdf to a cumulative distribution function:

cum = 0.0
cdf = pdf.map { |k,v| [k, cum += v] }
  #=> [[0.5, 0.03], [1.0, 0.09], [1.5, 0.19], [2.0, 0.34], [2.5, 0.49], [3.0, 0.59],
  #    [4.0, 0.7], [6.0, 0.84], [9.0, 0.94], [12.0, 0.97], [14.0, 0.99], [15.0, 1.0]]

Now use Kernel#rand to generate pseudo random variates between 0.0 and 1.0 and use Enumerable#find to associate the random variate with a cdf key:

def rnd(cdf)
  r = rand
  cdf.find { |k,v| r < v }.first
end

Note that cdf.find { |k,v| rand < v }.first would produce erroneous results, since rand is executed for each key-value pair of cdf.

Let's try it 100,000 times, recording the relative frequencies

n = 100_000
inc = 1.0/n

n.times.with_object(Hash.new(0.0)) { |_, h| h[rnd(cdf)] += inc }.
  sort.
  map { |k,v| [k, v.round(5)] }.to_h
  #=> { 0.5=>0.03053, 1.0=>0.05992, 1.5=>0.10084, 2.0=>0.14959, 2.5=>0.15024,
  #     3.0=>0.10085, 4.0=>0.10946, 6.0=>0.13923, 9.0=>0.09919, 12.0=>0.03073, 
  #    14.0=>0.01931, 15.0=>0.01011}

Upvotes: 1

Casper

Reputation: 34308

Here's a very crude, unscientific, non-mathy attempt at using the F-distribution with the parameters you gave in the F-function image (3 and 36).

First I calculate what F-value is needed for the CDF to be 0.975 (100% - 2.5% for the upper end of the range for your number 15):

To calculate that we can use the p_value method like so:

> F_15 = Distribution::F.p_value(0.975, 3, 36)
=> 3.5046846420861977

Next we simply use a multiplier so that when we calculate the CDF it will return the value 15 when the F-value is F_15.

> M = 15 / F_15
=> 4.27998565687528

And now we can generate random numbers with rand, which has a range of 0..1 like so:

[M * Distribution::F.p_value(rand, 3, 36), 15].min

The question is will this function be close to the number 2 with a 45% probability? Well..sort of. You need to pick the right parameters for the F-distribution to tweak the curve (or just adjust the multiplier M). But here's a sample with the parameters from your image:

0.step(0.99, 0.02).map { |n| 
  sprintf("%0.2f", M * Distribution::F.p_value(n, 3, 36)) 
}

Gives you:

["0.00", "0.26", "0.42", "0.57", "0.70", "0.83", "0.95", "1.07", 
 "1.20", "1.31", "1.43", "1.55", "1.67", "1.80", "1.92", "2.04", 
 "2.17", "2.30", "2.43", "2.56", "2.70", "2.84", "2.98", "3.13", 
 "3.28", "3.44", "3.60", "3.77", "3.95", "4.13", "4.32", "4.52", 
 "4.73", "4.95", "5.18", "5.43", "5.69", "5.97", "6.28", "6.61", 
 "6.97", "7.37", "7.81", "8.32", "8.90", "9.60", "10.45", "11.56",
 "13.14", "15.90"]

Upvotes: 2

Severin Pappadeux

Reputation: 20080

I'll take it back, there is no rng call for F. So, if you want to use Distribution gem, what I would propose is to use Chi² with 4 degrees of freedom.

Mode for Chi² with k degress of freedom is equal to k-2, so for 4 d.f. you'll get mode at 2, see here. My Ruby is rusty, bear with me

require 'distribution'
normal = Distribution::Normal.rng(0)

g1 = normal.call
g2 = normal.call
g3 = normal.call
g4 = normal.call

chi2 = g1*g1 + g2*g2 + g3*g3 + g4*g4

UPDATE

You have to truncate it at 15, so if generated chi2 is greater than 15 just reject it and generate another one. Though I would say you won't see a lot of value above 15, check graphs for PDF/CDF.

UPDATE II

And if you want to get samples from F, make generic Chi² generator for d degrees of freedom from code above, and just sample ratio of chi2, check here

chi2_d1 = DChi2(d1)
chi2_d2 = DChi2(d2)

f = (chi2_d1.call / d1) / (chi2_d2.call / d2)

UPDATE III

And, frankly, I don't see how you could get F distribution working for you. It is ok at 0, but mode is equal to (d1-2)/d1 * d2/(d2 + 2), and it is hard to see it equal to 2. Graph you provided has mode at about 1/3.

Upvotes: 2

Generating a random number with weighted probability - &#39;Distribution&#39; gem

Answers (3)

Related Questions

Generating a random number with weighted probability - 'Distribution' gem