Reputation: 31
I am using different python to fit density functions on a dataset. This data set is made of positive time values starting from 1 second.
I tested different density functions from scipy.statistics
and the powerlaw
library, as well as my own functions using scipy.optimize
's function curve_fit()
.
So far, I obtained the best results when fitting the following "modified" power law function :
def funct(x, alpha, x0):
return((x+x0)**(-alpha))
My code is as follow :
bins = range(1,int(s_distrib.max())+2,1)
y_data, x_data = np.histogram(s_distrib, bins=bins, density=True)
x_data = x_data[:-1]
param_bounds=([0,-np.inf],[np.inf,np.inf])
fit = opt.curve_fit(funct,
x_data,
y_data,
bounds=param_bounds) # you can pass guess for the parameters/errors
alpha,x0 = fit[0]
print(fit[0])
C = 1/integrate.quad(lambda t: funct(t,alpha,x0),1,np.inf)[0]
# Calculate fitted PDF and error with fit in distribution
pdf = [C*funct(x,alpha,x0) for x in x_data]
sse = np.sum(np.power(y_data - pdf, 2.0))
print(sse)
fig, ax = plt.subplots(figsize=(6,4))
ax.loglog(x_data, y_data, basex=10, basey=10,linestyle='None', marker='.')
ax.loglog(x_data, pdf, basex=10, basey=10,linestyle='None', marker='.')
The fitting returns a value of 8.48 for x0, and of 1.40 for alpha. In the loglog plot, the data and fit plot look like this :
opt.curve_fit
when changing (x+x0) to (x-x0) in the funct
function ? Since my bounds for x0 are (-inf, +inf), I was expecting the fitting to return -8.48./anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in reciprocal This is separate from the ipykernel package so we can avoid doing imports until ValueError: Residuals are not finite in the initial point.
This is a lot of questions as I am very unfamiliar with the subject, any comment and answer, even partial, will be very appreciated!
Upvotes: 0
Views: 4853
Reputation: 20080
Is (x+x0)^(-alpha) a standard distribution?
To answer your second question, yes, it is standard distribution, called Zipf distribution. It is implemented in Python/NumPy as well.
What does the x0 value represents
this is shift parameter. Any distribution on top of standard parameters (like power parameter in Zipf) might have shift and scale parameters, which basically says your X values are measured in different units with different origin point.
Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution.
This is how Zipf law is defined, from 0 to Infinity. Shifting it means your origin would be different
Upvotes: 1