ninesalt
ninesalt

Reputation: 4354

Estimate value from curve fit

I'm trying to write some very basic Python code that outputs a number based on a sample of inputs and outputs. So for example if:

x  = [1, 2, 3, 4, 5]
y = [2, 5, 10, 17, 26]

z = np.interp(7, xp, yp)
print(z)  ##expected 50, actual was 26

I'd like to have some way finding a best fit function that maps these values together so that I can pass it another x value and get a rough approximation of the y value. I tried reading about scipy.optimize.curve_fit but as far as I can tell, this isn't what I should be using because this uses a predefined function which in my case I don't have.

Note, I have no restriction on whether the function should be linear/periodic/quadratic etc because my values will vary but my assumption is that most of the functions should be linear.

I also tried numpy.interp but I just get the last value in the y array for whatever x I input.

EDIT: After messing around with Cleb's answer and then comparing it with kennytm's original approach, here are my findings.enter image description here The most accurate technique here should be the function that is nearest to the red line. Green line represents kennytm's approach (quadratic regression was the most accurate one I tried) and the black line represents Cleb's technique (UnivariateSpline). It appears that since UnivariateSpline has no prior knowledge of the underlying model it is a little better at adapting to the values of the function which makes somewhat more accurate.

Upvotes: 2

Views: 3265

Answers (2)

Cleb
Cleb

Reputation: 25997

Another option is to use a spline, e.g. scipy.interpolate.UnivariateSpline, if you don't care about the underlying model (e.g. whether it is linear, cubic, etc) and overfitting.

Then you can just do:

from scipy.interpolate import UnivariateSpline

x  = [1, 2, 3, 4, 5]
y = [2, 5, 10, 17, 26]
spl = UnivariateSpline(x, y)

To get an estimate at x=7, you can now simply do:

spl(7)

which returns the value you expected:

array(49.99999999999993)

This approach avoids the definition of a model.

Upvotes: 3

kennytm
kennytm

Reputation: 523214

I tried reading about scipy.optimize.curve_fit but as far as I can tell, this isn't what I should be using because this uses a predefined function which in my case I don't have.

Actually that function for scipy.optimize.curve_fit is the model you want to fit. Say you want linear regression, then you use:

def linear(x, a, b):
    return a*x + b

fit_params, _ = scipy.optimize.curve_fit(linear, xp, yp)
print(linear(7, *fit_params))
# 36.0

Similar for quadratic regression etc:

def quadratic(x, a, b, c):
    return a*x*x + b*x + c

fit_params, _ = scipy.optimize.curve_fit(quadratic, xp, yp)
print(quadratic(7, *fit_params))
# 50.000000000004555

(The second return value of curve_fit is the covariant matrix of the output, which gives a rough picture how good the fit is)


If you just want to fit a polynomial with least-squares, you could just use numpy.polyfit.

linear_coeff = numpy.polyfit(xp, yp, deg=1)
print(numpy.polyval(linear_coeff, 7))
# 35.999999999999986

quadratic_coeff = numpy.polyfit(xp, yp, deg=2)
print(numpy.polyval(quadratic_coeff, 7))
# 50.000000000000085

Upvotes: 2

Related Questions