pythonmachine-learningscikit-learnlinear-regression

Reputation: 1317

How to find the features names of the coefficients using scikit linear regression?

I use scikit linear regression and if I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff.

#training the model
model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long']
model_2_features = model_1_features + ['bed_bath_rooms']
model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])

model_2 = linear_model.LinearRegression()
model_2.fit(train_data[model_2_features], train_data['price'])

model_3 = linear_model.LinearRegression()
model_3.fit(train_data[model_3_features], train_data['price'])

# extracting the coef
print model_1.coef_
print model_2.coef_
print model_3.coef_

Upvotes: 37

Answers (10)

StephGC

Reputation: 69

Adding from Ian's answer, if you also want to add the intercept to the list of named coefficients, and turn it into a dataframe:

lr = LinearRegression()
lr.fit(X=X, y=y)
res = pd.DataFrame(
    {'feature' : np.insert(lr.feature_names_in_, 0, 'intercept'), 
     'coef' : np.insert(lr.coef_, 0, lr.intercept_)}
)
res

Which returns :

    feature     coef
0   intercept   41.617270
1   crim        -0.121389
2   zn          0.046963
3   indus       0.013468

Upvotes: 0

Ian Thompson

Reputation: 3285

As of scikit-learn version 1.0, the LinearRegression estimator has a feature_names_in_ attribute. From the docs:

feature_names_in_ : ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

New in version 1.0.

Assuming you're fitting on a pandas.DataFrame (train_data), your estimators (model_1, model_2, and model_3) will have the attribute. You can line up your coefficients using any of the methods listed in previous answers, but I'm in favor of this one:

coef_series = pd.Series(
    data=model_1.coef_,
    index=model_1.feature_names_in_
)

A minimally reproducible example

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression


# for repeatability
np.random.seed(0)

# random data
Xy = pd.DataFrame(
  data=np.random.random((10, 3)),
  columns=["x0", "x1", "y"]
)

# separate X and y
X = Xy.drop(columns="y")
y = Xy.y

#  initialize estimator
lr = LinearRegression()

# fit to pandas.DataFrame
lr.fit(X, y)

# get coeficients and their respective feature names
coef_series = pd.Series(
  data=lr.coef_,
  index=lr.feature_names_in_
)

print(coef_series)

x0    0.230524
x1   -0.275611
dtype: float64

Upvotes: 5

Vaibhav Srivastava

Reputation: 29

Right after training the model, the coefficient values are stored in the variable model.coef_[0]. We can iterate over the column names and store the column name and their coefficient value in a dictionary.

model.fit(X_train,y)
# assuming all the columns except last one is used in training
columns = data.iloc[:,-1].columns
coef_dict = {}
for i in range(0,len(columns)):
  coef_dict[columns[i]] = model.coef_[0][i]

Hope this helps!

Upvotes: 0

Andrew Shade

Reputation: 36

All of these answers were great but what personally worked for me was this, as the feature names I needed were the columns of my train_date dataframe:

pd.DataFrame(data=model_1.coef_,columns=train_data.columns)

Upvotes: 0

Mike Zubko

Reputation: 19

pd.DataFrame(data=regression.coef_, index=X_train.columns)

Upvotes: 0

Yagmur Rigo

Reputation: 131

import pandas as pd

import numpy as np

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",regressor.coef_.transpose())

Upvotes: 13

ZaxR

Reputation: 5155

Borrowing from Robin, but simplifying the syntax:

coef_dict = dict(zip(model_1_features, model_1.coef_))

Important note about zip: zip assumes its inputs are of equal length, making it especially important to confirm that the lengths of the features and coefficients match (which in more complicated models might not be the case). If one input is longer than the other, the longer input will have the values in its extra index positions cut off. Notice the missing 7 in the following example:

In [1]: [i for i in zip([1, 2, 3], [4, 5, 6, 7])]
Out[1]: [(1, 4), (2, 5), (3, 6)]

Upvotes: 0

rocksteady

Reputation: 2242

@Robin posted a great answer, but for me I had to make one tweak on it to work the way I wanted, and it was to refer to the dimension of the 'coef_' np.array that I wanted, namely modifying to this: model_1.coef_[0,:], as below:

coef_dict = {}
for coef, feat in zip(model_1.coef_[0,:],model_1_features):
    coef_dict[feat] = coef

Then the dict was created as I pictured it, with {'feature_name' : coefficient_value} pairs.

Upvotes: 10

Reddspark

Reputation: 7567

Here is what I use for pretty printing of coefficients in Jupyter. I'm not sure I follow why order is an issue - as far as I know the order of the coefficients should match the order of the input data that you gave it.

Note that the first line assumes you have a Pandas data frame called df in which you originally stored the data prior to turning it into a numpy array for regression:

fieldList = np.array(list(df)).reshape(-1,1)

coeffs = np.reshape(np.round(clf.coef_,5),(-1,1))
coeffs=np.concatenate((fieldList,coeffs),axis=1)
print(pd.DataFrame(coeffs,columns=['Field','Coeff']))

Upvotes: 0

Robin Spiess

Reputation: 1480

The trick is that right after you have trained your model, you know the order of the coefficients:

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
print(list(zip(model_1.coef_, model_1_features)))

This will print the coefficients and the correct feature. (Tested with pandas DataFrame)

If you want to reuse the coefficients later you can also put them in a dictionary:

coef_dict = {}
for coef, feat in zip(model_1.coef_,model_1_features):
    coef_dict[feat] = coef

(You can test it for yourself by training two models with the same features but, as you said, shuffled order of features.)

Upvotes: 27

How to find the features names of the coefficients using scikit linear regression?

Answers (10)

A minimally reproducible example

Related Questions