rhcpps
rhcpps

Reputation: 31

Power law distribution fitting in Python

I am using different python to fit density functions on a dataset. This data set is made of positive time values starting from 1 second.

I tested different density functions from scipy.statistics and the powerlaw library, as well as my own functions using scipy.optimize's function curve_fit().

So far, I obtained the best results when fitting the following "modified" power law function :

def funct(x, alpha, x0):
    return((x+x0)**(-alpha))

My code is as follow :

bins = range(1,int(s_distrib.max())+2,1)
y_data, x_data = np.histogram(s_distrib, bins=bins, density=True)
x_data = x_data[:-1]

param_bounds=([0,-np.inf],[np.inf,np.inf])
fit = opt.curve_fit(funct,
                    x_data,
                    y_data,
                    bounds=param_bounds) # you can pass guess for the parameters/errors
alpha,x0 = fit[0]
print(fit[0])

C = 1/integrate.quad(lambda t: funct(t,alpha,x0),1,np.inf)[0]

# Calculate fitted PDF and error with fit in distribution
pdf = [C*funct(x,alpha,x0) for x in x_data]
sse = np.sum(np.power(y_data - pdf, 2.0))
print(sse)

fig, ax = plt.subplots(figsize=(6,4))
ax.loglog(x_data, y_data, basex=10, basey=10,linestyle='None',  marker='.')
ax.loglog(x_data, pdf, basex=10, basey=10,linestyle='None',  marker='.')

The fitting returns a value of 8.48 for x0, and of 1.40 for alpha. In the loglog plot, the data and fit plot look like this :

plot

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in reciprocal This is separate from the ipykernel package so we can avoid doing imports until ValueError: Residuals are not finite in the initial point.

This is a lot of questions as I am very unfamiliar with the subject, any comment and answer, even partial, will be very appreciated!

Upvotes: 0

Views: 4853

Answers (1)

Severin Pappadeux
Severin Pappadeux

Reputation: 20080

Is (x+x0)^(-alpha) a standard distribution?

To answer your second question, yes, it is standard distribution, called Zipf distribution. It is implemented in Python/NumPy as well.

What does the x0 value represents

this is shift parameter. Any distribution on top of standard parameters (like power parameter in Zipf) might have shift and scale parameters, which basically says your X values are measured in different units with different origin point.

Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution.

This is how Zipf law is defined, from 0 to Infinity. Shifting it means your origin would be different

Upvotes: 1

Related Questions