Link_tester
Link_tester

Reputation: 1081

how to fit a function to data in python

I want to fit a function to the independant (X) and dependent (y) variables:

import numpy as np
y = np.array([1.45952016, 1.36947283, 1.31433227, 1.24076599, 1.20577963,
       1.14454815, 1.13068077, 1.09638278, 1.08121406, 1.04417094,
       1.02251471, 1.01268524, 0.98535659, 0.97400591])
X = np.array([4.571428571362048, 8.771428571548313, 12.404761904850602, 17.904761904850602,
            22.904761904850602, 31.238095237873495, 37.95833333302289, 
            44.67857142863795, 51.39880952378735, 64.83928571408615, 
            71.5595238097012, 85., 98.55357142863795, 112.1071428572759])

I already tried scipy package in this way:

from scipy.optimize import curve_fit
def func (x, a, b, c):
    return 1/(a*(x**2) + b*(x**1) + c)
g = [1, 1, 1]
c, cov = curve_fit (func, X.flatten(), y.flatten(), g)
test_ar = np.arange(min(X), max(X), 0.25)
pred = np.empty(len(test_ar))
for i in range (len(test_ar)):
    pred[i] = func(test_ar[i], c[0], c[1], c[2])

I can add higher orders of polynomial to make my func more accurate but I want to keep it simple. I very much appreciate if anyone an give me some help on how to find another function or make my prediction better. The figure also shows the result of the prediction:

enter image description here

Upvotes: 0

Views: 5283

Answers (1)

Vasil Yordanov
Vasil Yordanov

Reputation: 417

First thing you want to do is to specify how do you measure "accuracy" which in your case is not an appropriate term at all.

What are you essentially doing is called linear regression. Suitable metrics in this case are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE). It is up to you to decide which metric to use and what threshold to set for being "acceptable".

The image that you are showing above (where you've fitted the line) looks fine BUT please expand your X-axis from -100 to 300 and show us the image again this is a problem with high degree polynomials.

This is a 101 example how to use regression in scikit-learn. In your case if you want to use x^2 or x^3 for predicting y, you just need to add them in to the data ... Currently your X variable is an array (a vector) you need to expand that to become a matrix where each column is a feature (x, x^2, x^3 ...)

here is some code:

import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score

y = [1.45952016, 1.36947283, 1.31433227, 1.24076599, 
 1.20577963, 1.14454815, 1.13068077, 1.09638278, 
 1.08121406, 1.04417094, 1.02251471, 1.01268524, 0.98535659, 
 0.97400591]

x = [4.571428571362048, 8.771428571548313, 12.404761904850602, 
 17.904761904850602, 22.904761904850602, 31.238095237873495,
 37.95833333302289, 44.67857142863795, 51.39880952378735, 
 64.83928571408615, 71.5595238097012, 85., 98.55357142863795, 112.1071428572759]

df = pd.DataFrame({
    'x' : x,
    'x^2': [i**2 for i in x],
    'x^3': [i**3 for i in x],
    'y': y
})

X = df[['x','x^2','x^3']]
y = df['y']

model = linear_model.LinearRegression()
model.fit(X, y)
y1 = model.predict(X)

coef = model.coef_
intercept = model.intercept_

linear regression

you can see the coefficients from the coef variable:

array([-1.67456732e-02,  2.03899728e-04, -8.70976426e-07])

you can see the intercept from the intercept variable:

1.5042389677980577

which in your case means -> y1 = -1.67e-2x +2.03e-4x^2 -8.70e-7x^3 + 1.5

Upvotes: 1

Related Questions