rovyko
rovyko

Reputation: 4577

Scikit learn order of coefficients for multiple linear regression and polynomial features

I'm fitting a simple polynomial regression model, and I want get the coefficients from the fitted model.

Given the prep code:

import pandas as pd
from itertools import product
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# data creation
sa = [1, 0, 1, 2, 3]
sb = [2, 1, 0, 1, 2]
raw = {'a': [], 'b': [], 'w': []}
for (ai, av), (bi, bv) in product(enumerate(sa), enumerate(sb)):
    raw['a'].append(ai)
    raw['b'].append(bi)
    raw['w'].append(av + bv)
data = pd.DataFrame(raw)

# regression
x = data[['a', 'b']].values
y = data['w']
poly = PolynomialFeatures(2)
linr = LinearRegression()
model = make_pipeline(poly, linr)
model.fit(x, y)

From this answer, I know the coefficients can obtained using with

model.steps[1][1].coef_
>>> array([  0.00000000e+00,  -5.42857143e-01,  -1.71428571e+00,
             2.85714286e-01,   1.72774835e-16,   4.28571429e-01])

But this provides a 1-dimensional array and I'm not sure which numbers correspond to which variables.

Are they ordered as a0, a1, a2, b0, b1, b2 or as a0, b0, a1, b1, a2, b2?

Upvotes: 1

Views: 3833

Answers (2)

Vivek Kumar
Vivek Kumar

Reputation: 36599

You can use the get_feature_names() of the PolynomialFeatures to know the order.

In the pipeline you can do this:

model.steps[0][1].get_feature_names()

# Output:
['1', 'x0', 'x1', 'x0^2', 'x0 x1', 'x1^2']

If you have the names of the features with you ('a', 'b' in your case), you can pass that to get actual features.

model.steps[0][1].get_feature_names(['a', 'b'])

# Output:
['1', 'a', 'b', 'a^2', 'a b', 'b^2']

Upvotes: 3

Jonathan Guymont
Jonathan Guymont

Reputation: 497

First, the coefficients of a polynomial of degree 2 are 1, a, b, a^2, ab, and b^2 and they come in this order in the scikit-learn implementation. You can verify this by creating a simple set of inputs, e.g.

x = np.array([[2, 3], [2, 3], [2, 3]])
print(x)
[[2 3]
 [2 3]
 [2 3]]

And then creating the polynomial features:

poly = PolynomialFeatures(2)
x_poly = poly.fit_transform(x)
print(x_poly)
[[1. 2. 3. 4. 6. 9.]
 [1. 2. 3. 4. 6. 9.]
 [1. 2. 3. 4. 6. 9.]]

You can see that the first and second feature are a and b (without counting the bias coefficient 1), the third feature is a^2 (i.e. 2^2), the fourth is ab=2*3, and the last is b^2=3^2. i.e. you model is:

enter image description here

Upvotes: 1

Related Questions