Ong Jun Keat
Ong Jun Keat

Reputation: 13

Fitting Gamma distribution in Python

I'm finding the parameters of Gamma distribution for a small sample. Later, I need to use the parameters to predict future data. However, the result shown incorrect answer.

This is the result I got from Excel and it was the correct answer I'm looking for Alpha 0.458718895 Beta 96.76626573

import scipy.stats as stats 
data=[0.0621,0.046,0.0324,0.0279]
fit_alpha, fit_loc, fit_beta=stats.gamma.fit(data,floc=0)
print(fit_alpha, fit_loc, fit_beta)
ll=[1,2,3,4,5,6,7,8,9,10]
plop=stats.gamma.pdf(ll,fit_alpha, fit_loc, fit_beta)
print(plop)

Expected result: 6.29% 4.28% 3.40% 2.88% 2.53% 2.27% 2.06% 1.90% 1.76% 1.65%

Upvotes: 1

Views: 4841

Answers (2)

Jean A.
Jean A.

Reputation: 301

I think your are confusing "sample" and "PDF value at some points"

If you consider that your data is a sample i.e. 4 draws from a Gamma law then the fitting will give something like that (I use OpenTURNS platform)

import openturns as ot

sample = ot.Sample([[x] for x in data])
gamma_fitting = ot.GammaFactory().build(sample)
print (gamma_fitting)
>>> Gamma(k = 1.49938, lambda = 79.5426, gamma = 0.02325)

Plotting the result will show you that your data corresponds to the fitting if your data (4 input numbers) were on the abscissa axis.

enter image description here

In fact, you were looking for the Gamma that verifies :

PDF([1,2,3,4]) ~ [ 0.0621, 0.046, 0.0324, 0.0279 ] = data

Upvotes: 0

mikuszefski
mikuszefski

Reputation: 4043

You are using the fit the wrong way. You try to fit the PDF while scipy.stat is fitting the best underlying distribution to random data. Have a look here:

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats 
from scipy.optimize import leastsq

def my_res( params, yData ):
    a, b = params
    xList= range( 1, len(yData) + 1 )
    th = np.fromiter( ( stats.gamma.pdf( x, a, loc=0, scale=b ) for x in  xList ), np.float )
    diff = th - np.array( yData )
    return diff

data = [ 0.0621, 0.046, 0.0324, 0.0279 ]

### this does not work as data is supposed to be the random variate data and not the pdf
fit_alpha, fit_loc, fit_beta = stats.gamma.fit(data, floc=0 )
print 'data fitted the wrong way:'
print(fit_alpha, fit_loc, fit_beta)


#### but making a least square fit with the pdf works
sol, err = leastsq( my_res, [.4, 1 ], args=( data, ) )
print '...and the right way:'
print sol
datath = [ stats.gamma.pdf( x, sol[0], loc=0, scale=sol[1]) for x in range(1,5) ]

### the result gives the expected answer
ll=[1,2,3,4,5,6,7,8,9,10]
plop=stats.gamma.pdf(ll, sol[0], loc=0, scale=sol[1])
print 'expected values:'
print(plop)


### if we generate random numbers with gamma distribution
### the fit does what is should

testData = stats.gamma.rvs(sol[0], loc=0, scale=sol[1], size=5000 )
print 'using stats.gamma.fit the correct way:'
print stats.gamma.fit( testData, floc=0 )

fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( data , ls='', marker='x')
ax.plot( datath , ls='', marker='^')

plt.show()

providing

>> data fitted the wrong way:
>> (10.36700043818477, 0, 0.00406096249836482)
>> ...and the right way:
>> [ 0.45826569 96.8498341 ]
>> expected values:
>> [0.06298405 0.04282212 0.0340243  0.02881519 0.02527189 0.02265992 0.02063036 0.01899356 0.01763645 0.01648688]
>> using stats.gamma.fit the correct way:
>> (0.454884062189886, 0, 94.94258888249479)

enter image description here

Upvotes: 2

Related Questions