Reputation: 13
I'm finding the parameters of Gamma distribution for a small sample. Later, I need to use the parameters to predict future data. However, the result shown incorrect answer.
This is the result I got from Excel and it was the correct answer I'm looking for Alpha 0.458718895 Beta 96.76626573
import scipy.stats as stats
data=[0.0621,0.046,0.0324,0.0279]
fit_alpha, fit_loc, fit_beta=stats.gamma.fit(data,floc=0)
print(fit_alpha, fit_loc, fit_beta)
ll=[1,2,3,4,5,6,7,8,9,10]
plop=stats.gamma.pdf(ll,fit_alpha, fit_loc, fit_beta)
print(plop)
Expected result: 6.29% 4.28% 3.40% 2.88% 2.53% 2.27% 2.06% 1.90% 1.76% 1.65%
Upvotes: 1
Views: 4841
Reputation: 301
I think your are confusing "sample" and "PDF value at some points"
If you consider that your data is a sample i.e. 4 draws from a Gamma law then the fitting will give something like that (I use OpenTURNS platform)
import openturns as ot
sample = ot.Sample([[x] for x in data])
gamma_fitting = ot.GammaFactory().build(sample)
print (gamma_fitting)
>>> Gamma(k = 1.49938, lambda = 79.5426, gamma = 0.02325)
Plotting the result will show you that your data corresponds to the fitting if your data (4 input numbers) were on the abscissa axis.
In fact, you were looking for the Gamma that verifies :
PDF([1,2,3,4]) ~ [ 0.0621, 0.046, 0.0324, 0.0279 ] = data
Upvotes: 0
Reputation: 4043
You are using the fit
the wrong way. You try to fit the PDF while scipy.stat
is fitting the best underlying distribution to random data. Have a look here:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
from scipy.optimize import leastsq
def my_res( params, yData ):
a, b = params
xList= range( 1, len(yData) + 1 )
th = np.fromiter( ( stats.gamma.pdf( x, a, loc=0, scale=b ) for x in xList ), np.float )
diff = th - np.array( yData )
return diff
data = [ 0.0621, 0.046, 0.0324, 0.0279 ]
### this does not work as data is supposed to be the random variate data and not the pdf
fit_alpha, fit_loc, fit_beta = stats.gamma.fit(data, floc=0 )
print 'data fitted the wrong way:'
print(fit_alpha, fit_loc, fit_beta)
#### but making a least square fit with the pdf works
sol, err = leastsq( my_res, [.4, 1 ], args=( data, ) )
print '...and the right way:'
print sol
datath = [ stats.gamma.pdf( x, sol[0], loc=0, scale=sol[1]) for x in range(1,5) ]
### the result gives the expected answer
ll=[1,2,3,4,5,6,7,8,9,10]
plop=stats.gamma.pdf(ll, sol[0], loc=0, scale=sol[1])
print 'expected values:'
print(plop)
### if we generate random numbers with gamma distribution
### the fit does what is should
testData = stats.gamma.rvs(sol[0], loc=0, scale=sol[1], size=5000 )
print 'using stats.gamma.fit the correct way:'
print stats.gamma.fit( testData, floc=0 )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( data , ls='', marker='x')
ax.plot( datath , ls='', marker='^')
plt.show()
providing
>> data fitted the wrong way:
>> (10.36700043818477, 0, 0.00406096249836482)
>> ...and the right way:
>> [ 0.45826569 96.8498341 ]
>> expected values:
>> [0.06298405 0.04282212 0.0340243 0.02881519 0.02527189 0.02265992 0.02063036 0.01899356 0.01763645 0.01648688]
>> using stats.gamma.fit the correct way:
>> (0.454884062189886, 0, 94.94258888249479)
Upvotes: 2