Reputation: 1081
I want to fit a function to the independant (X
) and dependent (y
) variables:
import numpy as np
y = np.array([1.45952016, 1.36947283, 1.31433227, 1.24076599, 1.20577963,
1.14454815, 1.13068077, 1.09638278, 1.08121406, 1.04417094,
1.02251471, 1.01268524, 0.98535659, 0.97400591])
X = np.array([4.571428571362048, 8.771428571548313, 12.404761904850602, 17.904761904850602,
22.904761904850602, 31.238095237873495, 37.95833333302289,
44.67857142863795, 51.39880952378735, 64.83928571408615,
71.5595238097012, 85., 98.55357142863795, 112.1071428572759])
I already tried scipy package in this way:
from scipy.optimize import curve_fit
def func (x, a, b, c):
return 1/(a*(x**2) + b*(x**1) + c)
g = [1, 1, 1]
c, cov = curve_fit (func, X.flatten(), y.flatten(), g)
test_ar = np.arange(min(X), max(X), 0.25)
pred = np.empty(len(test_ar))
for i in range (len(test_ar)):
pred[i] = func(test_ar[i], c[0], c[1], c[2])
I can add higher orders of polynomial to make my func
more accurate but I want to keep it simple. I very much appreciate if anyone an give me some help on how to find another function or make my prediction better. The figure also shows the result of the prediction:
Upvotes: 0
Views: 5283
Reputation: 417
First thing you want to do is to specify how do you measure "accuracy" which in your case is not an appropriate term at all.
What are you essentially doing is called linear regression. Suitable metrics in this case are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE). It is up to you to decide which metric to use and what threshold to set for being "acceptable".
The image that you are showing above (where you've fitted the line) looks fine BUT please expand your X-axis from -100 to 300 and show us the image again this is a problem with high degree polynomials.
This is a 101 example how to use regression in scikit-learn. In your case if you want to use x^2 or x^3 for predicting y, you just need to add them in to the data ... Currently your X variable is an array (a vector) you need to expand that to become a matrix where each column is a feature (x, x^2, x^3 ...)
here is some code:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
y = [1.45952016, 1.36947283, 1.31433227, 1.24076599,
1.20577963, 1.14454815, 1.13068077, 1.09638278,
1.08121406, 1.04417094, 1.02251471, 1.01268524, 0.98535659,
0.97400591]
x = [4.571428571362048, 8.771428571548313, 12.404761904850602,
17.904761904850602, 22.904761904850602, 31.238095237873495,
37.95833333302289, 44.67857142863795, 51.39880952378735,
64.83928571408615, 71.5595238097012, 85., 98.55357142863795, 112.1071428572759]
df = pd.DataFrame({
'x' : x,
'x^2': [i**2 for i in x],
'x^3': [i**3 for i in x],
'y': y
})
X = df[['x','x^2','x^3']]
y = df['y']
model = linear_model.LinearRegression()
model.fit(X, y)
y1 = model.predict(X)
coef = model.coef_
intercept = model.intercept_
you can see the coefficients from the coef
variable:
array([-1.67456732e-02, 2.03899728e-04, -8.70976426e-07])
you can see the intercept from the intercept
variable:
1.5042389677980577
which in your case means -> y1 = -1.67e-2x +2.03e-4x^2 -8.70e-7x^3 + 1.5
Upvotes: 1