Noah Tavares
Noah Tavares

Reputation: 81

How do I use scipy optimize curve fit with panda df

This is my first post here, I've spent hours looking for this answer but I can't seem to figure this out. I've used pandas to pass a .csv to a np matrix. From there I tried to apply a simple curve fit but the output I am getting is consistently wrong. the code will plot wrong fit, and will not plot the data.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

df = pd.read_csv("Results.csv")
xdata = df['Frame'].as_matrix()
ydata = df['Area'].as_matrix()

def func(x, a, b, c):
    return (a*np.sin(b*x))+(c * np.exp(x))
popt, pcov = curve_fit(func, xdata, ydata)

plt.plot(xdata, func(xdata, *popt), 'r-',
        label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
popt, pcov = curve_fit(func, xdata, ydata)

plt.plot(xdata, func(xdata, *popt), 'g--',
          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

This is what the data looks like: This is what the data looks like

Thanks in advance for your help.

Upvotes: 4

Views: 7722

Answers (1)

James Phillips
James Phillips

Reputation: 4657

Your model contains "exp(x)" and the data file contains x values of 1000, and this is giving math overflow errors no matter the starting values - the optimizer cannot find a way out of that problem, and you must change the equation to fit this data set. I can suggest other equations, but this data set cannot be fit to the posted equation.

EDIT: Per your comment on dividing by 100, here is code using scipy's Differential Evolution genetic algorithm module to find initial parameter estimates, which uses the Latin Hypercube algorithm to ensure a thorough search of parameter space - that algorithm requires bounds within which to search, and ranges on parameters are much easier to find than exact initial parameter values. Here I tried a few ranges and got what is probably the best fit you can get here from what I can see.

plot

import pandas as pd
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings


df = pd.read_csv("Results.csv")
xData = df['Frame'].as_matrix() / 100.0
yData = df['Area'].as_matrix()

def func(x, a, b, c):
    return (a*numpy.sin(b*x))+(c * numpy.exp(x))


# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
    val = func(xData, *parameterTuple)
    return numpy.sum((yData - val) ** 2.0)


def generate_Initial_Parameters():

    parameterBounds = []
    parameterBounds.append([0.0, 100.0]) # search bounds for a
    parameterBounds.append([0.0, 1.0]) # search bounds for b
    parameterBounds.append([0.0, 1.0]) # search bounds for c

    # "seed" the numpy random number generator for repeatable results
    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()

# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Upvotes: 3

Related Questions