Reputation: 189
I used Scikit learn selectKbest
to select the best features, around 500 from 900 of them. as follows where d is the dataframe of all the features.
from sklearn.feature_selection import SelectKBest, chi2, f_classif
X_new = SelectKBest(chi2, k=491).fit_transform(d, label_vs)
when I print X_new
it now, it gives me numbers only but I need name of the selected features to use them later on.
I tried things like X_new.dtype.names
but I did't got back anything and I tried to convert X_new
into data frame but the only columns names I got were
1, 2, 3, 4...
so is there a way to know what are the names of the selected features?
Upvotes: 3
Views: 5004
Reputation: 5102
Here is how you could do it, using get_support()
:
chY = SelectKBest(chi2, k=491)
X_new = chY.fit_transform(d, label_vs)
column_names = [column[0] for column in zip(d.columns,chY.get_support()) if column[1]]
From @AI_Learning 's answer you could get the column names by:
column_names = d.columns[chY.get_support()]
Upvotes: 3
Reputation: 16966
You can use the .get_support()
param of feature_selection, to get the feature names from your initial dataframe.
feature_selector = SelectKBest(chi2, k=491)
d.columns[feature_selector.get_support()]
Working example:
from sklearn.datasets import load_digits
import pandas as pd
from sklearn.feature_selection import SelectKBest, chi2
X, y = load_digits(return_X_y=True)
df = pd.DataFrame(X, columns= ['feaure %s'%i for i in range(X.shape[1])])
feature_selector = SelectKBest(chi2, k=20)
X_new = feature_selector.fit_transform(df, y)
X_new.shape
df.columns[feature_selector.get_support()]
Output:
Index(['feaure 5', 'feaure 6', 'feaure 13', 'feaure 19', 'feaure 20', 'feaure 21', 'feaure 26', 'feaure 28', 'feaure 30', 'feaure 33', 'feaure 34', 'feaure 41', 'feaure 42', 'feaure 43', 'feaure 44', 'feaure 46', 'feaure 54', 'feaure 58', 'feaure 61', 'feaure 62'], dtype='object')
Upvotes: 2