Fitting a log-log data using scipy.optmize.curve_fit

Question

I have two variables x and y which I am trying to fit using curve_fit from scipy.optimize.

The equation that fits the data is a simple power law of the form y=a(x^b). The fit seems to be well for the data when I set the x and y axis to log scale, i.e ax.set_xscale('log') and ax.set_yscale('log').

Here is the code:

def fitfunc(x,p1,p2):
    y = p1*(x**p2)
    return y

popt_1,pcov_1 = curve_fit(fitfunc,x,y,p0=(1.0,1.0))
p1_1 = popt_1[0]
p1_2 = popt_1[1]
residuals1 = (ngal_mstar_1) - fitfunc(x,p1_1,p1_2)
xi_sq_1 = sum(residuals1**2) #The chi-square value

curve_y_1 = fitfunc(x,p1_1,p1_2) #This is the fit line seen in the graph

fig = plt.figure(figsize=(14,12))
ax1 = fig.add_subplot(111)
ax1.scatter(x,y,c='r')
ax1.plot(y,curve_y_1,'y.',linewidth=1)
ax1.legend(loc='best',shadow=True,scatterpoints=1)
ax1.set_xscale('log') #Scale is set to log
ax1.set_yscale('log') #SCale is set to log
plt.show()

enter image description here

When I use true log-log values for x and y, the power law fit becomes y=10^(a+b*log(x)),i.e raising the power of the right side to 10 as it is logbase 10. Now both by x and y values are log(x) and log(y).

The fit for the above does not seem to be good. Here is the code I have used.

def fitfunc(x,p1,p2):
    y = 10**(p1+(p2*x))
    return y

popt_1,pcov_1 = curve_fit(fitfunc,np.log10(x),np.log10(y),p0=(1.0,1.0))

p1_1 = popt_1[0]
p1_2 = popt_1[1]
residuals1 = (y) - fitfunc((x),p1_1,p1_2)
xi_sq_1 = sum(residuals1**2)

curve_y_1 = fitfunc(np.log10(x),p1_1,p1_2) #The fit line uses log(x) here itself

fig = plt.figure(figsize=(14,12))
ax1 = fig.add_subplot(111)
ax1.scatter(np.log10(x),np.log10(y),c='r')
ax1.plot(np.log10(y),curve_y_1,'y.',linewidth=1)
plt.show()

enter image description here

THE ONLY DIFFERENCE BETWEEN THE TWO PLOTS IS THE FITTING EQUATIONS, AND FOR THE SECOND PLOT THE VALUES HAVE BEEN LOGGED INDEPENDENTLY. Am I doing something wrong here, because I want a log(x) vs log(y) plot and the corresponding fit parameters (slope and intercept)

burnpanck · Accepted Answer

Your transformation of the power-law model to log-log is wrong, i.e. your second fit actually fits a different model. Take your original model y=a*(x^b) and apply the logarithm on both sides, you will get log(y) = log(a) + b*log(x). Thus, your model in log-scale should simply read y' = a' + b*x', where the primes indicate variables in log-scale. The model is now a linear function, a well known result that all power-laws become linear functions in log-log.

That said, you can still expect some small differences in the two versions of your fit, since curve_fit will optimise the least-squares problem. Therefore, in log scale, the fit will minimise the relative error between the fit and the data, while in linear scale, the fit will minimise the absolute error. Thus, in order to decide which way is actually the better domain for your fit, you will have to estimate the error in your data. The data you show certainly does not have a constant uncertainty in log-scale, so on linear scale your fit might be more faithful. If details about the error in each data-point are known, then you could consider using the sigma parameter. If that one is used properly, there should not be much difference in the two approaches. In that case, I would prefer the log-scale fitting, as the model is simpler and therefore likely to be more numerically stable.

Fitting a log-log data using scipy.optmize.curve_fit

Answers (1)

Related Questions