emax
emax

Reputation: 7245

Python: how to fit a gamma distribution from data?

I have a dataset and I am trying to see which is the best distribution its following.

In the firs attempt I tried to fit it with a rayleigh, so

y, x = np.histogram(data, bins=45, normed=True)
param = rayleigh.fit(y) # distribution fitting
# fitted distribution
xx = linspace(0,45,1000)
pdf_fitted = rayleigh.pdf(xx,loc=param[0],scale=param[1])
pdf = rayleigh.pdf(xx,loc=0,scale=8.5)

fig,ax = plt.subplots(figsize=(7,5))
plot(xx,pdf,'r-', lw=5, alpha=0.6, label='rayleigh pdf')
plot(xx,pdf,'k-', label='Data')
plt.bar(x[1:], y)
ax.set_xlabel('Distance, '+r'$x [km]$',size = 15)
ax.set_ylabel('Frequency, '+r'$P(x)$',size=15)
ax.legend(loc='best', frameon=False)

enter image description here

I am trying to do the same with a gamma distribution without succeding

y, x = np.histogram(net1['distance'], bins=45, normed=True)
xx = linspace(0,45,1000)
ag,bg,cg = gamma.fit(y)  
pdf_gamma = gamma.pdf(xx, ag, bg,cg)  
fig,ax = plt.subplots(figsize=(7,5))
# fitted distribution
plot(xx,pdf_gamma,'r-', lw=5, alpha=0.6, label='gamma pdf')
plot(xx,pdf_gamma,'k-')
plt.bar(x[1:], y, label='Data')
ax.set_xlabel('Distance, '+r'$x [km]$',size = 15)
ax.set_ylabel('Frequency, '+r'$P(x)$',size=15)
ax.legend(loc='best', frameon=False)

enter image description here

Upvotes: 7

Views: 11533

Answers (2)

mchrgr2000
mchrgr2000

Reputation: 69

Unfortunately scipy.stats.gamma is not well documented.

suppose you have some "raw" data in the form data=array([a1,a2,a3,.....]), these can be the results of an experiment of yours.

You can give these raw values to the fit method: gamma.fit(data) and it will return for you three parameters a,b,c = gamma.fit(data). These are the "shape", the "loc"ation and the "scale" of the gamma curve that fits better the DISTRIBUTION HISTOGRAM of your data (not the actual data).

I noticed from the questions online that many people confuse. They have a distribution of data, and try to fit it with gamma.fit. This is wrong.

The method gamma.fit expects your raw data, not the distribution of your data. This will presumably solve problems to few of us.

GR

Upvotes: 3

JulienD
JulienD

Reputation: 7293

My guess is that you have much of the original data at 0, so the alpha of the fit ends up lower than 1 (0.34) and you get the decreasing shape with singularity at 0. The bar plot does not include the zero (x[1:]) so you don't see the huge bar on the left.

Can I be right?

Upvotes: 1

Related Questions