Reputation: 78
I'm trying to fit a curve with scipy.optimize.curve_fit and it works pretty good so far, except in the case that a value in my sigma array is zero. I understand that the algorithm can't handle this, as I divide by zero in this case. From the scipy documentation:
sigma : None or M-length sequence, optional If not None, the uncertainties in the ydata array. These are used as weights in the least-squares problem i.e. minimising np.sum( ((f(xdata, *popt) - ydata) / sigma)**2 ) If None, the uncertainties are assumed to be 1.
Here's what my code looks like:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = [0.125, 0.375, 0.625, 0.875, 1.125, 1.375, 1.625, 1.875, 2.125, 2.375, 2.625, 2.875, 3.125, 3.375, 3.625, 3.875, 4.125, 4.375]
y_para = [0, 0, 0.0414, 0.2164, 0.2616, 0.4254, 0.5698, 0.5921, 0.6286, 0.6452, 0.5879, 0.6032, 0.6667, 0.6325, 0.7629, 0.7164, 0.7091, 0.7887]
err = [0, 0, 0.0391, 0.0331, 0.0943, 0.0631, 0.1219, 0.1063, 0.0912, 0.0516, 0.0365, 0.0327, 0.0227, 0.103, 0.1344, 0.0697, 0.0114, 0.0465]
def logistic_growth(x, A1, A2, x_0, p):
return A2 + (A1-A2)/(1+(x/x_0)**p)
x_plot = np.linspace(0, 4.5, 100)
bounds_para = ([0.,0,-np.inf,-np.inf],[0.0000000001, 1,np.inf,np.inf])
paras, paras_cov = curve_fit(logistic_growth, x, y_para, bounds = bounds_para, sigma = err, absolute_sigma=True)
para_curve = logistic_growth(x_plot, *paras)
plt.figure()
plt.errorbar(x,y_para, err, color = 'b', fmt = 'o', label = "Data")
plt.plot(x_plot, para_curve, color = 'b', label = "Fit")
plt.show()
Executing this without the sigma-option in curve_fit works fine, but including it raises:
ValueError: Residuals are not finite in the initial point.
, which results from the zeros in the err-array. Does anyone know a way to work around this?
Upvotes: 0
Views: 5902
Reputation: 21663
This is what the scipy doc says about the curve_fit sigma parameter: 'These are used as weights in the least-squares problem ...' Then, in my opinion, they should be inverse to the errors. Here's what I suggest.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = [0.125, 0.375, 0.625, 0.875, 1.125, 1.375, 1.625, 1.875, 2.125, 2.375, 2.625, 2.875, 3.125, 3.375, 3.625, 3.875, 4.125, 4.375]
y_para = [0, 0, 0.0414, 0.2164, 0.2616, 0.4254, 0.5698, 0.5921, 0.6286, 0.6452, 0.5879, 0.6032, 0.6667, 0.6325, 0.7629, 0.7164, 0.7091, 0.7887]
err = [0, 0, 0.0391, 0.0331, 0.0943, 0.0631, 0.1219, 0.1063, 0.0912, 0.0516, 0.0365, 0.0327, 0.0227, 0.103, 0.1344, 0.0697, 0.0114, 0.0465]
weights = [1/max(_,0.001) for _ in err]
print (weights)
def logistic_growth(x, A1, A2, x_0, p):
return A2 + (A1-A2)/(1+(x/x_0)**p)
x_plot = np.linspace(0, 4.5, 100)
bounds_para = ([0.,0,-np.inf,-np.inf],[0.0000000001, 1,np.inf,np.inf])
paras, paras_cov = curve_fit(logistic_growth, x, y_para, bounds = bounds_para,
absolute_sigma=True,
sigma = weights)
para_curve = logistic_growth(x_plot, *paras)
plt.figure()
plt.errorbar(x,y_para, err, color = 'b', fmt = 'o', label = "Data")
plt.plot(x_plot, para_curve, color = 'b', label = "Fit")
plt.show()
This results in the following plot, where those initial data points are made to lie very close to the fitted line.
Upvotes: 1
Reputation: 566
Why not just drop the variable? If it has zero variance it cannot contribute in any meaningful way to your analysis.
Upvotes: 1