Power law distribution fitting in Python

Question

I am using different python to fit density functions on a dataset. This data set is made of positive time values starting from 1 second.

I tested different density functions from scipy.statistics and the powerlaw library, as well as my own functions using scipy.optimize's function curve_fit().

So far, I obtained the best results when fitting the following "modified" power law function :

def funct(x, alpha, x0):
    return((x+x0)**(-alpha))

My code is as follow :

bins = range(1,int(s_distrib.max())+2,1)
y_data, x_data = np.histogram(s_distrib, bins=bins, density=True)
x_data = x_data[:-1]

param_bounds=([0,-np.inf],[np.inf,np.inf])
fit = opt.curve_fit(funct,
                    x_data,
                    y_data,
                    bounds=param_bounds) # you can pass guess for the parameters/errors
alpha,x0 = fit[0]
print(fit[0])

C = 1/integrate.quad(lambda t: funct(t,alpha,x0),1,np.inf)[0]

# Calculate fitted PDF and error with fit in distribution
pdf = [C*funct(x,alpha,x0) for x in x_data]
sse = np.sum(np.power(y_data - pdf, 2.0))
print(sse)

fig, ax = plt.subplots(figsize=(6,4))
ax.loglog(x_data, y_data, basex=10, basey=10,linestyle='None',  marker='.')
ax.loglog(x_data, pdf, basex=10, basey=10,linestyle='None',  marker='.')

The fitting returns a value of 8.48 for x0, and of 1.40 for alpha. In the loglog plot, the data and fit plot look like this :

My first question is technical. Why do I get the following warning and error in opt.curve_fit when changing (x+x0) to (x-x0) in the funct function ? Since my bounds for x0 are (-inf, +inf), I was expecting the fitting to return -8.48.

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in reciprocal This is separate from the ipykernel package so we can avoid doing imports until ValueError: Residuals are not finite in the initial point.

My other questions are theoritical. Is (x+x0)^(-alpha) a standard distribution? What does the x0 value represents, how to interpret this 8.48s value physically? From what I understand this means that my distribution corresponds to a shifted power law distribution? Can I consider that x0 corresponds to the xmin value classically needed when fitting data to power laws ?
Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution. However, I am wondering what is the standard way to characterise the full data with a distribution that would be a power law after xmin and something else before xmin.

This is a lot of questions as I am very unfamiliar with the subject, any comment and answer, even partial, will be very appreciated!

Power law distribution fitting in Python

Answers (1)

Related Questions