astrochris
astrochris

Reputation: 1866

Calculate a polynomial regression

From what I understand polynomial regression is a specific type of regression analysis, which is more complicated than linear regression. Is there a python module which can do this? I have looked in matplotlib, scikit and numpy but can only find linear regression analysis.

And it is possible to work out the correlation coefficient of a non-linear line?

Upvotes: 11

Views: 38319

Answers (3)

Ioannis Nasios
Ioannis Nasios

Reputation: 8527

You can first make your polynomial features using PolynomialFeatures from sklearn and then use your linear model.

Function bellow can be used for predictions of a trained model.

from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2)

lm_polyfeats = linear_model.LinearRegression()
lm_polyfeats.fit(poly.fit_transform(array2D),targetArray)

def LM_polynomialFeatures_2Darray(lm_polyfeats,array2D):
    array2D=poly.fit_transform(array2D)
    return(lm.predict(array2D))

p=LM_polynomialFeatures_2Darray(lm_polyfeats,array2D)

Upvotes: 1

adrianus
adrianus

Reputation: 3199

Have you had a look at NumPy's polyfit? See reference.

From their examples:

>>> import numpy as np
>>> x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
>>> y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
>>> z = np.polyfit(x, y, 3)
>>> z
[ 0.08703704 -0.81349206  1.69312169 -0.03968254]

Upvotes: 16

fferri
fferri

Reputation: 18950

scikit supports linear and polynomial regression.

Check the Generalized Linear Models page at section Polynomial regression: extending linear models with basis functions.

Example:

>>> from sklearn.preprocessing import PolynomialFeatures
>>> import numpy as np
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> poly = PolynomialFeatures(degree=2)
>>> poly.fit_transform(X)
array([[ 1,  0,  1,  0,  0,  1],
       [ 1,  2,  3,  4,  6,  9],
       [ 1,  4,  5, 16, 20, 25]])

The features of X have been transformed from [x_1, x_2] to [1, x_1, x_2, x_1^2, x_1 x_2, x_2^2], and can now be used within any linear model.

This sort of preprocessing can be streamlined with the Pipeline tools. A single object representing a simple polynomial regression can be created and used as follows:

>>> from sklearn.preprocessing import PolynomialFeatures
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.pipeline import Pipeline
>>> model = Pipeline([('poly', PolynomialFeatures(degree=3)),
...                   ('linear', LinearRegression(fit_intercept=False))])
>>> # fit to an order-3 polynomial data
>>> x = np.arange(5)
>>> y = 3 - 2 * x + x ** 2 - x ** 3
>>> model = model.fit(x[:, np.newaxis], y)
>>> model.named_steps['linear'].coef_
array([ 3., -2.,  1., -1.])

The linear model trained on polynomial features is able to exactly recover the input polynomial coefficients.

In some cases it’s not necessary to include higher powers of any single feature, but only the so-called interaction features that multiply together at most d distinct features. These can be gotten from PolynomialFeatures with the setting interaction_only=True.

Upvotes: 13

Related Questions