Reputation: 1612
When doing feature selection with the feature_selection function from sklearn, is there a way to keep track of actual feature names instead of the default "f1", "f2", etc...? I have a huge number of features so I can't manually keep track. Obviously, I can write code to do this but I'm wondering if there's just some easy option that I can set.
Upvotes: 1
Views: 822
Reputation: 381
If you have a pandas dataframe you can return the names of the columns selected by the function, you just need to use get_support method.
Here you have a quick example with some modifications from the official documentation.
import pandas as pd
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression
X = [[ 0.87, -1.34, 0.31, 0],
[-2.79, -0.02, -0.85, 1],
[-1.34, -0.48, -2.55, 0],
[ 1.92, 1.48, 0.65, 1]]
df = pd.DataFrame(X, columns=['col1', 'col2', 'col3', 'label'])
train_x = df.loc[:, ['col1', 'col2', 'col3']]
y = df.label
selector = SelectFromModel(estimator=LogisticRegression()).fit(train_x, y)
col_index = selector.get_support()
print(train_x.columns[col_index])
# output print --> Index(['col2'], dtype='object')
Upvotes: 2