Reputation: 53
For several hours now, I have been trying to fit a model to a (generated) dataset as a casus for a problem I've been struggling with. I generated datapoints for the function f(x) = A*cos^n(x)+b, and added some noise. When I try to fit the dataset with this function and curve_fit, I get the error
./tester.py:10: RuntimeWarning: invalid value encountered in power
return Amp*(np.cos(x))**n + b
/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py:690: OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)
The code I'm using to generate the datapoints and fit the model is the following:
#!/usr/bin/env python
from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import figure, show, rc, plot
def f(x, Amp, n, b):
return np.real(Amp*(np.cos(x))**n + b)
x = np.arange(0, 6.28, 0.01)
randomPart = np.random.rand(len(x))-0.5
fig = figure()
sample = f(x, 5, 2, 5)+randomPart
frame = fig.add_subplot(1,1,1)
frame.plot(x, sample, label="Sample measurements")
popt, pcov = curve_fit(f, x, sample, p0=(1,1,1))
modeldata = f(x, popt[0], popt[1], popt[2])
print(modeldata)
frame.plot(x, modeldata, label="Best fit")
frame.legend()
frame.set_xlabel("x")
frame.set_ylabel("y")
show()
The noisy data is shown - see the image below.
Does any of you have a clue of what's going on? I suspect it has something to do with the power law going into the complex domain, as the real part of the function is nowhere divergent. I have tried returning only the real part of the function, setting realistic bounds in curve_fit and using a numpy array instead of a python list for p0 already as well. I'm running the latest version of scipy available to me, scipy 0.17.0-1.
Upvotes: 5
Views: 1981
Reputation: 35176
The problem is the following:
>>> (-2)**1.1
(-2.0386342710747223-0.6623924280875919j)
>>> np.array(-2)**1.1
__main__:1: RuntimeWarning: invalid value encountered in power
nan
Unlike native python floats, numpy doubles usually refuse to take part in operations leading to complex results:
>>> np.sqrt(-1)
__main__:1: RuntimeWarning: invalid value encountered in sqrt
nan
As a quick workaround I suggest adding an np.abs
call to your function, and using appropriate bounds for fitting to make sure this doesn't give a spurious fit. If your model is near the truth and your sample (I mean the cosine in your sample) is positive, then adding an absolute value around it should be a no-op (update: I realize this is never the case, see the proper approach below).
def f(x, Amp, n, b):
return Amp*(np.abs(np.cos(x)))**n + b # only change here
With this small change I get this:
For reference, the parameters from the fit are (4.96482314, 2.03690954, 5.03709923])
comparing to the generation with (5,2,5)
.
After giving it a bit more thought I realized that the cosine will always be negative for half your domain (duh). So the workaround I suggested might be a bit problematic, or at least its correctness is non-trivial. On the other hand, thinking of your original formula containing cos(x)^n
, with negative values for cos(x)
this only makes sense as a model if n
is an integer, otherwise you get a complex result. Since we can't solve Diophantine fitting problems, we need to handle this properly.
The most proper way (by which I mean the way that is least likely to bias your data) is this: first do the fitting with a model that converts your data to complex numbers then takes the complex magnitude on output:
def f(x, Amp, n, b):
return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b
This is obviously much less efficient than my workaround, since in each fitting step we create a new mesh, and do some extra work both in the form of complex arithmetic and an extra magnitude calculation. This gives me the following fit even with no bounds set:
The parameters are (5.02849409, 1.97655728, 4.96529108)
. These are close too. However, if we put these values back into the actual model (without np.abs
), we get imaginary parts as large as -0.37
, which is not overwhelming but significant.
So the second step should be redoing the fit with a proper model---one that has an integer exponent. Take the exponent 2 which is obvious from your fit, and do a new fit with this model. I don't believe any other approach gives you a mathematically sound result. You can also start from the original popt
, hoping that it's indeed close to the truth. Of course we could use the original function with some currying, but it's much faster to use a dedicated double-specific version of your model.
from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import subplots, show
def f_aux(x, Amp, n, b):
return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b
def f_real(x, Amp, n, b):
return Amp*np.cos(x)**n + b
x = np.arange(0, 2*np.pi, 0.01) # pi
randomPart = np.random.rand(len(x)) - 0.5
sample = f(x, 5, 2, 5) + randomPart
fig,(frame_aux,frame) = subplots(ncols=2)
for fr in frame_aux,frame:
fr.plot(x, sample, label="Sample measurements")
fr.legend()
fr.set_xlabel("x")
fr.set_ylabel("y")
# auxiliary fit for n value
popt_aux, pcov_aux = curve_fit(f_aux, x, sample, p0=(1,1,1))
modeldata = f(x, *popt_aux)
#print(modeldata)
print('Auxiliary fit parameters: {}'.format(popt_aux))
frame_aux.plot(x, modeldata, label="Auxiliary fit")
# check visually, test if it's close to an integer, but otherwise
n = np.round(popt_aux[1])
# actual fit with integral exponent
popt, pcov = curve_fit(lambda x,Amp,b,n=n: f_real(x,Amp,n,b), x, sample, p0=(popt_aux[0],popt_aux[2]))
modeldata = f(x, popt[0], n, popt[1])
#print(modeldata)
print('Final fit parameters: {}'.format([popt[0],n,popt[1]]))
frame.plot(x, modeldata, label="Best fit")
frame_aux.legend()
frame.legend()
show()
Note that I changed a few things in your code which doesn't really affect my point. The figure from the above, so the one that shows both the auxiliary fit and the proper one:
The output:
Auxiliary fit parameters: [ 5.02628994 2.00886409 5.00652371]
Final fit parameters: [5.0288141074549699, 2.0, 5.0009730316739462]
Just to reiterate: while there might be no visual difference between the auxiliary fit and the proper one, only the latter gives a meaningful answer to your problem.
Upvotes: 7