Reputation: 2614
I have a bunch of x, y points that represent a sigmoidal function:
x=[ 1.00094909 1.08787635 1.17481363 1.2617564 1.34867881 1.43562284
1.52259341 1.609522 1.69631283 1.78276102 1.86426648 1.92896789
1.9464453 1.94941586 2.00062852 2.073691 2.14982808 2.22808316
2.30634034 2.38456905 2.46280126 2.54106611 2.6193345 2.69748825]
y=[-0.10057627 -0.10172142 -0.10320428 -0.10378959 -0.10348456 -0.10312503
-0.10276956 -0.10170055 -0.09778279 -0.08608644 -0.05797392 0.00063599
0.08732999 0.16429878 0.2223306 0.25368884 0.26830932 0.27313931
0.27308756 0.27048902 0.26626313 0.26139534 0.25634544 0.2509893 ]
I use scipy.interpolate.UnivariateSpline()
to fit to some cubic spline as follows:
from scipy.interpolate import UnivariateSpline
s = UnivariateSpline(x, y, k=3, s=0)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x,y)
plt.plot(xfit, s(xfit))
plt.show()
Since I specify s=0
, the spline adheres completely to the data, but there are too many wiggles. Using a higher k
value leads to even more wiggles.
So my questions are --
scipy.interpolate.UnivariateSpline()
to fit my data? More precisely, how do I make the spline minimise its wiggling?scipy.optimize.curve_fit()
with a trial tanh(x)
function instead?Upvotes: 3
Views: 3045
Reputation: 25997
There are several options, I list a few below. The last one seems to give the best output. Whether you should use a spline or an actual function depends on what you want to do with the output; I list two analytical functions below that could be used but I don't know in which context the data were derived so it is hard to find the best one for you.
You can play with s
, e.g. for s=0.005
, the plot looks like this (still not extremely pretty but you could further adjust):
But I would indeed use a "proper" function and fit using e.g. curve_fit
. The function below is still not ideal as it is monotonically increasing, so we miss the decrease at the end; the plot looks as follows:
This is the entire code, for both the spline and the actual fit:
from scipy.interpolate import UnivariateSpline
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def func(x, ymax, n, k, c):
return ymax * x ** n / (k ** n + x ** n) + c
x=np.array([ 1.00094909, 1.08787635, 1.17481363, 1.2617564, 1.34867881, 1.43562284,
1.52259341, 1.609522, 1.69631283, 1.78276102, 1.86426648, 1.92896789,
1.9464453, 1.94941586, 2.00062852, 2.073691, 2.14982808, 2.22808316,
2.30634034, 2.38456905, 2.46280126, 2.54106611, 2.6193345, 2.69748825])
y=np.array([-0.10057627, -0.10172142, -0.10320428, -0.10378959, -0.10348456, -0.10312503,
-0.10276956, -0.10170055, -0.09778279, -0.08608644, -0.05797392, 0.00063599,
0.08732999, 0.16429878, 0.2223306, 0.25368884, 0.26830932, 0.27313931,
0.27308756, 0.27048902, 0.26626313, 0.26139534, 0.25634544, 0.2509893 ])
popt, pcov = curve_fit(func, x, y, p0=[y.max(), 2, 2, -0.1], bounds=([0, 0, 0, -0.2], [0.4, 45, 2000, 10]))
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, func(xfit, *popt))
plt.show()
s = UnivariateSpline(x, y, k=3, s=0.005)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, s(xfit))
plt.show()
A third option is to use a more advanced function that can also reproduce the decrease at the end and differential_evolution
for the fit; that seems to give the best fit:
The code is as follows (using the same data as above):
from scipy.optimize import curve_fit, differential_evolution
def sigmoid_with_decay(x, a, b, c, d, e, f):
return a * (1. / (1. + np.exp(-b * (x - c)))) * (1. / (1. + np.exp(d * (x - e)))) + f
def error_sigmoid_with_decay(parameters, x_data, y_data):
return np.sum((y_data - sigmoid_with_decay(x_data, *parameters)) ** 2)
res = differential_evolution(error_sigmoid_with_decay,
bounds=[(0, 10), (0, 25), (0, 10), (0, 10), (0, 10), (-1, 0.1)],
args=(x, y),
seed=42)
xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x, y)
plt.plot(xfit, sigmoid_with_decay(xfit, *res.x))
plt.show()
The fit is quite sensitive regarding the bounds, so be careful when you play with that...
Upvotes: 3
Reputation: 4647
This illustrates the result of fitting two halves of the data to different functions, the lower half to all data with X < 2.0 and the upper half to all data with X >= 1.9, so that there is overlap in the data for the fitted curves. The code switches from one equation to another at the center of the overlap region, X = 1.95.
import numpy, matplotlib
import matplotlib.pyplot as plt
xData=numpy.array([ 1.00094909, 1.08787635, 1.17481363, 1.2617564, 1.34867881, 1.43562284,
1.52259341, 1.609522, 1.69631283, 1.78276102, 1.86426648, 1.92896789,
1.9464453, 1.94941586, 2.00062852, 2.073691, 2.14982808, 2.22808316,
2.30634034, 2.38456905, 2.46280126, 2.54106611, 2.6193345, 2.69748825])
yData=numpy.array([-0.10057627, -0.10172142, -0.10320428, -0.10378959, -0.10348456, -0.10312503,
-0.10276956, -0.10170055, -0.09778279, -0.08608644, -0.05797392, 0.00063599,
0.08732999, 0.16429878, 0.2223306, 0.25368884, 0.26830932, 0.27313931,
0.27308756, 0.27048902, 0.26626313, 0.26139534, 0.25634544, 0.2509893 ])
# function for x < 1.95 (fitted up to 2.0 for overlap)
def lowerFunc(x_in): # Bleasdale-Nelder Power With Offset
# coefficients
a = -1.1431476643503597E+03
b = 3.3819340844164983E+21
c = -6.3633178925040745E+01
d = 3.1481973843740194E+00
Offset = -1.0300724909782859E-01
temp = numpy.power(a + b * numpy.power(x_in, c), -1.0 / d)
temp += Offset
return temp
# function for x >= 1.95 (fitted down to 1.9 for overlap)
def upperFunc(x_in): # rational equation with Offset
# coefficients
a = -2.5294212380048242E-01
b = 1.4262697377369586E+00
c = -2.6141935706529118E-01
d = -8.8730045918252121E-02
Offset = -4.8283287597672708E-01
temp = (a * numpy.power(x_in, 2) + b * numpy.log(x_in)) # numerator
temp /= (1.0 + c * numpy.power(numpy.log(x_in), -1) + d * numpy.exp(x_in)) # denominator
temp += Offset
return temp
def combinedFunc(x_in):
returnVal = []
for x in x_in:
if x < 1.95:
returnVal.append(lowerFunc(x))
else:
returnVal.append(upperFunc(x))
return returnVal
modelPredictions = combinedFunc(xData)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = combinedFunc(xModel)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
Upvotes: 2