Reputation: 115
I'm trying to use curvefit to fit my data to two logarithms.
from scipy.optimize import curve_fit
def func_2log(x, a, b, c, d, e):
return a*np.log(x + 1 + b) +c*np.log(x + 1 + d) + e
p, cov = curve_fit(func_2log, x, y, p0 = initial_guess, sigma = yerr, maxfev = 100000)
However, when I print the covariance matrix, I get:
[[ -2.23883493e+06 -3.92893042e+09 -1.87846128e+01 -4.27371608e+02 2.46442543e+07]
[ -3.92893042e+09 -6.89487710e+12 -3.29629278e+04 -7.49919180e+05 4.32481902e+10]
[ -1.87846014e+01 -3.29629077e+04 -1.53314974e-04 -3.43241587e-03 2.06772574e+02]
[ -4.27371198e+02 -7.49918462e+05 -3.43241462e-03 -7.58284163e-02 4.70429813e+03]
[ 2.46442543e+07 4.32481902e+10 2.06772700e+02 4.70430264e+03 -2.71274697e+08]]
How is this matrix mathematically possible? How can a parameter be negatively correlated with itself??
Edit: I didn't want to include the data itself because x and y have about 90,000 elements. x goes from 0 to 87244 in steps of 1 and y ranges from 2 to 7.
Thanks for any help in advance.
Upvotes: 2
Views: 2070
Reputation: 35115
The computation inside curve_fit
for the covariance contains this:
cov_x = inv(dot(transpose(R), R))
Where R^T R is an approximation to the hessian produced by the optimization algorithm. Looks positive definite, right?
The result indeed is positive definite in exact arithmetic. However, what probably is happening in your case is that the approximation has a high condition number, so rounding errors in computing the inverse result to loss of positive definiteness. Indeed, the condition number of the matrix you give above is ~ 10^21.
If so, what this probably means in practice is that the estimated variance is infinite for some linear combination of the parameters, which prevents obtaining reliable estimates for any of the covariances.
(For example, if the best fit is obtained for a = 0
, b
becomes ill-defined which may ruin the covariance estimates.)
Upvotes: 3