Scikit learn order of coefficients for multiple linear regression and polynomial features

Question

I'm fitting a simple polynomial regression model, and I want get the coefficients from the fitted model.

Given the prep code:

import pandas as pd
from itertools import product
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# data creation
sa = [1, 0, 1, 2, 3]
sb = [2, 1, 0, 1, 2]
raw = {'a': [], 'b': [], 'w': []}
for (ai, av), (bi, bv) in product(enumerate(sa), enumerate(sb)):
    raw['a'].append(ai)
    raw['b'].append(bi)
    raw['w'].append(av + bv)
data = pd.DataFrame(raw)

# regression
x = data[['a', 'b']].values
y = data['w']
poly = PolynomialFeatures(2)
linr = LinearRegression()
model = make_pipeline(poly, linr)
model.fit(x, y)

From this answer, I know the coefficients can obtained using with

model.steps[1][1].coef_
>>> array([  0.00000000e+00,  -5.42857143e-01,  -1.71428571e+00,
             2.85714286e-01,   1.72774835e-16,   4.28571429e-01])

But this provides a 1-dimensional array and I'm not sure which numbers correspond to which variables.

Are they ordered as a⁰, a¹, a², b⁰, b¹, b² or as a⁰, b⁰, a¹, b¹, a², b²?

Vivek Kumar · Accepted Answer

You can use the get_feature_names() of the PolynomialFeatures to know the order.

In the pipeline you can do this:

model.steps[0][1].get_feature_names()

# Output:
['1', 'x0', 'x1', 'x0^2', 'x0 x1', 'x1^2']

If you have the names of the features with you ('a', 'b' in your case), you can pass that to get actual features.

model.steps[0][1].get_feature_names(['a', 'b'])

# Output:
['1', 'a', 'b', 'a^2', 'a b', 'b^2']

Scikit learn order of coefficients for multiple linear regression and polynomial features

Answers (2)

Related Questions