Chris Butcher
Chris Butcher

Reputation: 83

Generating a random number with weighted probability - 'Distribution' gem

I would like to create a random number generator, that generates a random decimal number:

I'm terrifically poor at mathematics but my research seems to tell me I want to pull a random number from a Cumulative Distribution Function resembling a Fisher–Snedecor (F) pattern, a bit like this one:

http://cdn.app.compendium.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/6303a2314437d8fcf2f72d9a56b1293a/f_distribution_probability.png

I am using a Ruby gem called Distribution (https://github.com/sciruby/distribution) to try and achieve this. It looks like the right tool, but I'm having a terrible time trying to understand how to use it to achieve the desired outcome :( Any help please.

Upvotes: 3

Views: 2578

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110675

Sometimes you know which distribution applies because of the nature of the data. If, for example, the random variable is the sum of independent, identical Bernouli (two-state) random variables, you know the former has a binomial distribution, which can be approximated by a Normal distribution. When, as here, that does not apply, you can use a continuous distribution, shaped by it's parameters, or simply use a discrete distribution. Others have made suggestions for using various continuous distributions, so I'll pass on some remarks about using a discrete distribution.

Suppose the discrete probability density function were the following:

pdf = [[0.5, 0.03], [1.0, 0.06], [1.5, 0.10], [ 2.0, 0.15], [2.5 , 0.15], [ 3.0, 0.10],
       [4.0, 0.11], [6.0, 0.14], [9.0, 0.10], [12.0, 0.03], [14.0, 0.02], [15.0, 0.01]] 


pdf.map(&:last).reduce(:+)
  #=> 1.0

This could be interpreted as there being a probability of 0.03 that the random variable will be less than 0.5, a 0.06 probability of the random variable being greater than or equal 0.5 and less than 1.0, and so on.

A discrete pdf might be constructed from historical data or by sampling, an advantage it has over using a continuous distribution. It can be made arbitrarily fine by increasing the numbers of intervals.

Next convert the pdf to a cumulative distribution function:

cum = 0.0
cdf = pdf.map { |k,v| [k, cum += v] }
  #=> [[0.5, 0.03], [1.0, 0.09], [1.5, 0.19], [2.0, 0.34], [2.5, 0.49], [3.0, 0.59],
  #    [4.0, 0.7], [6.0, 0.84], [9.0, 0.94], [12.0, 0.97], [14.0, 0.99], [15.0, 1.0]] 

Now use Kernel#rand to generate pseudo random variates between 0.0 and 1.0 and use Enumerable#find to associate the random variate with a cdf key:

def rnd(cdf)
  r = rand
  cdf.find { |k,v| r < v }.first
end

Note that cdf.find { |k,v| rand < v }.first would produce erroneous results, since rand is executed for each key-value pair of cdf.

Let's try it 100,000 times, recording the relative frequencies

n = 100_000
inc = 1.0/n

n.times.with_object(Hash.new(0.0)) { |_, h| h[rnd(cdf)] += inc }.
  sort.
  map { |k,v| [k, v.round(5)] }.to_h
  #=> { 0.5=>0.03053, 1.0=>0.05992, 1.5=>0.10084, 2.0=>0.14959, 2.5=>0.15024,
  #     3.0=>0.10085, 4.0=>0.10946, 6.0=>0.13923, 9.0=>0.09919, 12.0=>0.03073, 
  #    14.0=>0.01931, 15.0=>0.01011} 

Upvotes: 1

Casper
Casper

Reputation: 34308

Here's a very crude, unscientific, non-mathy attempt at using the F-distribution with the parameters you gave in the F-function image (3 and 36).

First I calculate what F-value is needed for the CDF to be 0.975 (100% - 2.5% for the upper end of the range for your number 15):

To calculate that we can use the p_value method like so:

> F_15 = Distribution::F.p_value(0.975, 3, 36)
=> 3.5046846420861977

Next we simply use a multiplier so that when we calculate the CDF it will return the value 15 when the F-value is F_15.

> M = 15 / F_15
=> 4.27998565687528

And now we can generate random numbers with rand, which has a range of 0..1 like so:

[M * Distribution::F.p_value(rand, 3, 36), 15].min

The question is will this function be close to the number 2 with a 45% probability? Well..sort of. You need to pick the right parameters for the F-distribution to tweak the curve (or just adjust the multiplier M). But here's a sample with the parameters from your image:

0.step(0.99, 0.02).map { |n| 
  sprintf("%0.2f", M * Distribution::F.p_value(n, 3, 36)) 
}

Gives you:

["0.00", "0.26", "0.42", "0.57", "0.70", "0.83", "0.95", "1.07", 
 "1.20", "1.31", "1.43", "1.55", "1.67", "1.80", "1.92", "2.04", 
 "2.17", "2.30", "2.43", "2.56", "2.70", "2.84", "2.98", "3.13", 
 "3.28", "3.44", "3.60", "3.77", "3.95", "4.13", "4.32", "4.52", 
 "4.73", "4.95", "5.18", "5.43", "5.69", "5.97", "6.28", "6.61", 
 "6.97", "7.37", "7.81", "8.32", "8.90", "9.60", "10.45", "11.56",
 "13.14", "15.90"]

Upvotes: 2

Severin Pappadeux
Severin Pappadeux

Reputation: 20080

I'll take it back, there is no rng call for F. So, if you want to use Distribution gem, what I would propose is to use Chi2 with 4 degrees of freedom.

Mode for Chi2 with k degress of freedom is equal to k-2, so for 4 d.f. you'll get mode at 2, see here. My Ruby is rusty, bear with me

require 'distribution'
normal = Distribution::Normal.rng(0)

g1 = normal.call
g2 = normal.call
g3 = normal.call
g4 = normal.call

chi2 = g1*g1 + g2*g2 + g3*g3 + g4*g4

UPDATE

You have to truncate it at 15, so if generated chi2 is greater than 15 just reject it and generate another one. Though I would say you won't see a lot of value above 15, check graphs for PDF/CDF.

UPDATE II

And if you want to get samples from F, make generic Chi2 generator for d degrees of freedom from code above, and just sample ratio of chi2, check here

chi2_d1 = DChi2(d1)
chi2_d2 = DChi2(d2)

f = (chi2_d1.call / d1) / (chi2_d2.call / d2)

UPDATE III

And, frankly, I don't see how you could get F distribution working for you. It is ok at 0, but mode is equal to (d1-2)/d1 * d2/(d2 + 2), and it is hard to see it equal to 2. Graph you provided has mode at about 1/3.

Upvotes: 2

Related Questions