Reputation: 327
I'm trying to get a discrete proximity to a gaussian distribution for n >= 2.
So let's say if n = 2 than the discrete proximity would be [0.5, 0.5].
when n = 3 than it would be [0.25, 0.5, 0.25]
when n = 4 than it would be [0.125, 0.375, 0.375, 0.125]
you get my point I hope.
The returned discrete proximity array sum should always be 1 as all distributions.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
import math
import scipy
from random import randint
def discrete_gauss(n):
g = [0.5, 0.5]
f = g
for x in range(1, n - 1):
f = np.convolve(f,g)
if(sum(f) != 1):
print("The distribution sum is not 1.")
else:
return f
Now 'discrete_gauss' works great when I use (1 < n < 68) but when I enter (n > 67), the sum of f is different than 1 (sometimes more sometimes less) and I don't know why. Anyone has any clue?
Sorry for the messy question I tried to keep it short. I'll be happy to clarify things if they are not clear. Thanks.
Upvotes: 1
Views: 1319
Reputation: 13999
Read this paper about the challenges of using floating point math, and then reconsider your approach.
Here's an alternative procedure for generating your desired "distribution" that avoids the floating point rounding errors in the summation that np.convolve
performs:
import numpy as np
import scipy.special as sps
def discrete_gauss(n):
f = np.array([sps.comb(n - 1, i, exact=True) for i in range(n)], dtype='O')
f = np.float64(f)/np.float64(f).sum()
if not np.allclose(f.sum(), 1.0):
raise ValueError("The distribution sum is not close to 1.\n"
"f.sum(): %s" % f.sum())
return f
The sequence you want is equivalent to the n
th level of Pascal's triangle (see figure at the top of the Wiki on Binomial theorem), normalized so that it can represent a probability. The above solution uses standard Python int
values (which are arbitrary precision by default in Python 3) to find the values in the n
th level, then switches to floating point math only at the very end for the normalization step (ie np.float64(f)/np.float64(f).sum()
).
Note the use of not np.allclose(f.sum(), 1.0)
in the check above, instead of f.sum() != 1.0
. As discussed below in the Deeper dive section, f.sum()
will be equal to 1.0
for ~90% of the values of n
from 1-1000. However, in general you cannot assume that the result of a floating point computation will exactly match the result you'd get from an equivalent computation using real numbers (see this paper for all of the gory details). When dealing with floats you should usually (by which I mean almost always) check that a result is close (ie equal to within a given tolerance/error) to your expected value, rather than equal to it.
This solution isn't perfect. Most values of n
produce results that exactly sum to 1.0
, but some don't. The following code checks the results of discrete_gauss(n)
for values of n
from 1-1000:
nnot1 = []
for n in range(1,1001):
if discrete_gauss(n).sum() != 1.0:
nnot1.append(n)
print('discrete_gauss(n).sum() was not equal to 1.0 for %d values of n.' % len(nnot1))
print(nnot1)
Output:
discrete_gauss(n).sum() was not equal to 1.0 for 75 values of n.
[78, 89, 110, 114, 125, 127, 180, 182, 201, 206, 235, 248, 273, 342, 346, 348, 365, 373, 383, 390, 402, 403, 421, 427, 429, 451, 454, 471, 502, 531, 540, 556, 558, 574, 579, 584, 587, 595, 600, 609, 617, 631, 633, 647, 648, 651, 657, 669, 674, 703, 705, 728, 731, 763, 765, 772, 778, 783, 798, 816, 837, 852, 858, 860, 861, 867, 874, 877, 906, 912, 941, 947, 959, 964, 972]
Thus, for ~8% of these values, dicrete_gauss(n).sum()
was not exactly equal to 1.0
. However, since no error was raised, np.allclose(dicrete_gauss(n).sum(), 1.0)
was always True
.
scipy.speical.comb(n, k, exact=True)
gives the (n, k)
th binomial coefficient as a standard Python int
, which is equivalent to the k
th value in the n
th level of Pascal's triangle.Upvotes: 2