How to fit a polynomial curve to data using scikit-learn?

Problem context

Using scikit-learn with Python, I'm trying to fit a quadratic polynomial curve to a set of data, so that the model would be of the form y = a2x^2 + a1x + a0 and the an coefficients will be provided by a model.

The problem

I don't know how to fit a polynomial curve using that package and there seem to be surprisingly few, clear references on how to do it (I've looked for a while). I've seen this question on doing something similar with NumPy, and also this question which does a more complicated fit than I require.

What a good solution would look like

Hopefully, a good solution would go around like this (sample adapted from linear fit code that I'm using):

x = my_x_data.reshape(len(profile), 1)
y = my_y_data.reshape(len(profile), 1)
regression = linear_model.LinearRegression(degree=2) # or PolynomialRegression(degree=2) or QuadraticRegression()
regression.fit(x, y)

I would imagine scikit-learn would have a facility like this, since it's pretty common (for example, in R, the formula for fitting can be provided in-code, and they should be able to be pretty interchangeable for that kind of use-case).

The question:

What is a good way to do this, or where can I find information about how to do this properly?

Upvotes: 15

Answers (4)

sezanzeb

Reputation: 1139

Here is how to do it in a neat pipeline

from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

model = Pipeline([
    ('scaler', StandardScaler()),
    ('poly', PolynomialFeatures(degree=3)),
    ('linear', LinearRegression())
])

model.fit(x, y)

model.predict([[1], [2]])

Upvotes: 0

pwellner

Reputation: 30

AGML's answer can be wrapped in a scikit-learn-compatible class like this:

class PolyEstimator:
    def __init__(self, degree=2):
        self.degree = degree

    def fit(self, x, y):
        self.z = np.poly1d(np.polyfit(x.flatten().tolist(), y, self.degree))

    def predict(self, x):
        return self.z(x.flatten().tolist())

Upvotes: 0

rabbit

Reputation: 1476

I believe the answer by Salvador Dali here will answer your question. In scikit-learn, it will suffice to construct the polynomial features from your data, and then run linear regression on that expanded dataset. If you're interested in reading some documentation about it, you can find more information here. For convenience's sake I will post the sample code that Salvador Dali provided:

from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model

X = [[0.44, 0.68], [0.99, 0.23]]
vector = [109.85, 155.72]
predict= [0.49, 0.18]

poly = PolynomialFeatures(degree=2)
X_ = poly.fit_transform(X)
predict_ = poly.fit_transform(predict)

clf = linear_model.LinearRegression()
clf.fit(X_, vector)
print clf.predict(predict_)

Upvotes: 9

AGML

Reputation: 920

Possible duplicate: https://stats.stackexchange.com/questions/58739/polynomial-regression-using-scikit-learn.

Is it crucial for some reason that this be done using scikit-learn? The operation you want can be performed very easily using numpy:

z = np.poly1d(np.polyfit(x,y,2))

After which z(x) returns the value of the fit at x.

A scikit-learn solution would almost certainly be simply a wrapper around the same code.

Upvotes: 20