Reputation: 7245
I have a dataset and I am trying to see which is the best distribution its following.
In the firs attempt I tried to fit it with a rayleigh
, so
y, x = np.histogram(data, bins=45, normed=True)
param = rayleigh.fit(y) # distribution fitting
# fitted distribution
xx = linspace(0,45,1000)
pdf_fitted = rayleigh.pdf(xx,loc=param[0],scale=param[1])
pdf = rayleigh.pdf(xx,loc=0,scale=8.5)
fig,ax = plt.subplots(figsize=(7,5))
plot(xx,pdf,'r-', lw=5, alpha=0.6, label='rayleigh pdf')
plot(xx,pdf,'k-', label='Data')
plt.bar(x[1:], y)
ax.set_xlabel('Distance, '+r'$x [km]$',size = 15)
ax.set_ylabel('Frequency, '+r'$P(x)$',size=15)
ax.legend(loc='best', frameon=False)
I am trying to do the same with a gamma
distribution without succeding
y, x = np.histogram(net1['distance'], bins=45, normed=True)
xx = linspace(0,45,1000)
ag,bg,cg = gamma.fit(y)
pdf_gamma = gamma.pdf(xx, ag, bg,cg)
fig,ax = plt.subplots(figsize=(7,5))
# fitted distribution
plot(xx,pdf_gamma,'r-', lw=5, alpha=0.6, label='gamma pdf')
plot(xx,pdf_gamma,'k-')
plt.bar(x[1:], y, label='Data')
ax.set_xlabel('Distance, '+r'$x [km]$',size = 15)
ax.set_ylabel('Frequency, '+r'$P(x)$',size=15)
ax.legend(loc='best', frameon=False)
Upvotes: 7
Views: 11533
Reputation: 69
Unfortunately scipy.stats.gamma is not well documented.
suppose you have some "raw" data in the form data=array([a1,a2,a3,.....]), these can be the results of an experiment of yours.
You can give these raw values to the fit method: gamma.fit(data) and it will return for you three parameters a,b,c = gamma.fit(data). These are the "shape", the "loc"ation and the "scale" of the gamma curve that fits better the DISTRIBUTION HISTOGRAM of your data (not the actual data).
I noticed from the questions online that many people confuse. They have a distribution of data, and try to fit it with gamma.fit. This is wrong.
The method gamma.fit expects your raw data, not the distribution of your data. This will presumably solve problems to few of us.
GR
Upvotes: 3
Reputation: 7293
My guess is that you have much of the original data at 0, so the alpha of the fit ends up lower than 1 (0.34) and you get the decreasing shape with singularity at 0. The bar plot does not include the zero (x[1:]) so you don't see the huge bar on the left.
Can I be right?
Upvotes: 1