Reputation: 83
I would like to create a random number generator, that generates a random decimal number:
I'm terrifically poor at mathematics but my research seems to tell me I want to pull a random number from a Cumulative Distribution Function resembling a Fisher–Snedecor (F) pattern, a bit like this one:
I am using a Ruby gem called Distribution (https://github.com/sciruby/distribution) to try and achieve this. It looks like the right tool, but I'm having a terrible time trying to understand how to use it to achieve the desired outcome :( Any help please.
Upvotes: 3
Views: 2578
Reputation: 110675
Sometimes you know which distribution applies because of the nature of the data. If, for example, the random variable is the sum of independent, identical Bernouli (two-state) random variables, you know the former has a binomial distribution, which can be approximated by a Normal distribution. When, as here, that does not apply, you can use a continuous distribution, shaped by it's parameters, or simply use a discrete distribution. Others have made suggestions for using various continuous distributions, so I'll pass on some remarks about using a discrete distribution.
Suppose the discrete probability density function were the following:
pdf = [[0.5, 0.03], [1.0, 0.06], [1.5, 0.10], [ 2.0, 0.15], [2.5 , 0.15], [ 3.0, 0.10],
[4.0, 0.11], [6.0, 0.14], [9.0, 0.10], [12.0, 0.03], [14.0, 0.02], [15.0, 0.01]]
pdf.map(&:last).reduce(:+)
#=> 1.0
This could be interpreted as there being a probability of 0.03 that the random variable will be less than 0.5, a 0.06 probability of the random variable being greater than or equal 0.5 and less than 1.0, and so on.
A discrete pdf might be constructed from historical data or by sampling, an advantage it has over using a continuous distribution. It can be made arbitrarily fine by increasing the numbers of intervals.
Next convert the pdf to a cumulative distribution function:
cum = 0.0
cdf = pdf.map { |k,v| [k, cum += v] }
#=> [[0.5, 0.03], [1.0, 0.09], [1.5, 0.19], [2.0, 0.34], [2.5, 0.49], [3.0, 0.59],
# [4.0, 0.7], [6.0, 0.84], [9.0, 0.94], [12.0, 0.97], [14.0, 0.99], [15.0, 1.0]]
Now use Kernel#rand to generate pseudo random variates between 0.0
and 1.0
and use Enumerable#find to associate the random variate with a cdf
key:
def rnd(cdf)
r = rand
cdf.find { |k,v| r < v }.first
end
Note that cdf.find { |k,v| rand < v }.first
would produce erroneous results, since rand
is executed for each key-value pair of cdf
.
Let's try it 100,000 times, recording the relative frequencies
n = 100_000
inc = 1.0/n
n.times.with_object(Hash.new(0.0)) { |_, h| h[rnd(cdf)] += inc }.
sort.
map { |k,v| [k, v.round(5)] }.to_h
#=> { 0.5=>0.03053, 1.0=>0.05992, 1.5=>0.10084, 2.0=>0.14959, 2.5=>0.15024,
# 3.0=>0.10085, 4.0=>0.10946, 6.0=>0.13923, 9.0=>0.09919, 12.0=>0.03073,
# 14.0=>0.01931, 15.0=>0.01011}
Upvotes: 1
Reputation: 34308
Here's a very crude, unscientific, non-mathy attempt at using the F-distribution with the parameters you gave in the F-function image (3 and 36).
First I calculate what F-value is needed for the CDF to be 0.975 (100% - 2.5% for the upper end of the range for your number 15):
To calculate that we can use the p_value
method like so:
> F_15 = Distribution::F.p_value(0.975, 3, 36)
=> 3.5046846420861977
Next we simply use a multiplier so that when we calculate the CDF it will return the value 15 when the F-value is F_15
.
> M = 15 / F_15
=> 4.27998565687528
And now we can generate random numbers with rand
, which has a range of 0..1 like so:
[M * Distribution::F.p_value(rand, 3, 36), 15].min
The question is will this function be close to the number 2 with a 45% probability? Well..sort of. You need to pick the right parameters for the F-distribution to tweak the curve (or just adjust the multiplier M
). But here's a sample with the parameters from your image:
0.step(0.99, 0.02).map { |n|
sprintf("%0.2f", M * Distribution::F.p_value(n, 3, 36))
}
Gives you:
["0.00", "0.26", "0.42", "0.57", "0.70", "0.83", "0.95", "1.07",
"1.20", "1.31", "1.43", "1.55", "1.67", "1.80", "1.92", "2.04",
"2.17", "2.30", "2.43", "2.56", "2.70", "2.84", "2.98", "3.13",
"3.28", "3.44", "3.60", "3.77", "3.95", "4.13", "4.32", "4.52",
"4.73", "4.95", "5.18", "5.43", "5.69", "5.97", "6.28", "6.61",
"6.97", "7.37", "7.81", "8.32", "8.90", "9.60", "10.45", "11.56",
"13.14", "15.90"]
Upvotes: 2
Reputation: 20080
I'll take it back, there is no rng
call for F
. So, if you want to use Distribution
gem, what I would propose is to use Chi2 with 4 degrees of freedom.
Mode for Chi2 with k
degress of freedom is equal to k-2
, so for 4 d.f. you'll get mode at 2, see here. My Ruby is rusty, bear with me
require 'distribution'
normal = Distribution::Normal.rng(0)
g1 = normal.call
g2 = normal.call
g3 = normal.call
g4 = normal.call
chi2 = g1*g1 + g2*g2 + g3*g3 + g4*g4
UPDATE
You have to truncate it at 15, so if generated chi2 is greater than 15 just reject it and generate another one. Though I would say you won't see a lot of value above 15, check graphs for PDF/CDF.
UPDATE II
And if you want to get samples from F
, make generic Chi2 generator for d
degrees of freedom from code above, and just sample ratio of chi2, check here
chi2_d1 = DChi2(d1)
chi2_d2 = DChi2(d2)
f = (chi2_d1.call / d1) / (chi2_d2.call / d2)
UPDATE III
And, frankly, I don't see how you could get F
distribution working for you. It is ok at 0
, but mode is equal to (d1-2)/d1 * d2/(d2 + 2)
, and it is hard to see it equal to 2. Graph you provided has mode at about 1/3.
Upvotes: 2