Reputation: 81
This is my first post here, I've spent hours looking for this answer but I can't seem to figure this out. I've used pandas to pass a .csv to a np matrix. From there I tried to apply a simple curve fit but the output I am getting is consistently wrong. the code will plot wrong fit, and will not plot the data.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
df = pd.read_csv("Results.csv")
xdata = df['Frame'].as_matrix()
ydata = df['Area'].as_matrix()
def func(x, a, b, c):
return (a*np.sin(b*x))+(c * np.exp(x))
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'g--',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
This is what the data looks like:
Thanks in advance for your help.
Upvotes: 4
Views: 7722
Reputation: 4657
Your model contains "exp(x)" and the data file contains x values of 1000, and this is giving math overflow errors no matter the starting values - the optimizer cannot find a way out of that problem, and you must change the equation to fit this data set. I can suggest other equations, but this data set cannot be fit to the posted equation.
EDIT: Per your comment on dividing by 100, here is code using scipy's Differential Evolution genetic algorithm module to find initial parameter estimates, which uses the Latin Hypercube algorithm to ensure a thorough search of parameter space - that algorithm requires bounds within which to search, and ranges on parameters are much easier to find than exact initial parameter values. Here I tried a few ranges and got what is probably the best fit you can get here from what I can see.
import pandas as pd
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
df = pd.read_csv("Results.csv")
xData = df['Frame'].as_matrix() / 100.0
yData = df['Area'].as_matrix()
def func(x, a, b, c):
return (a*numpy.sin(b*x))+(c * numpy.exp(x))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
parameterBounds = []
parameterBounds.append([0.0, 100.0]) # search bounds for a
parameterBounds.append([0.0, 1.0]) # search bounds for b
parameterBounds.append([0.0, 1.0]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
Upvotes: 3