Javi
Javi

Reputation: 385

Scikit Learn Tree-based feature selection keeping the columns name?

i want to make a selection of features tree-based. My dataset has about 30 columns and after doing, there are about 5. Which for me is great, the problem i have, is that the dataset of 5 columns that i get, does not keep the names of the columns and i can not identify them.

import pandas as pd
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel

data = pd.read_csv(file)
X = data.drop('target', 1)
y = data['target']
X.shape                        #(100000, 30)
clf = ExtraTreesClassifier()
clf = clf.fit(X, y)
clf.feature_importances_  

model = SelectFromModel(clf, prefit=True)
X_new = model.transform(X)
X_new.shape                    #(100000, 5)

Can someone help me please?

Upvotes: 0

Views: 840

Answers (1)

ehudk
ehudk

Reputation: 585

Now when I'm more sure of the answer, please try the following:

mask = model.get_support(indices=False) # this will return boolean mask for the columns X_new = X.loc[:, mask] # the sliced dataframe, keeping selected columns featured_col_names = X_new.columns # columns name index

If all you need is just the column names:
X.columns[model.get_support()]

Upvotes: 1

Related Questions