Talal Ghannam
Talal Ghannam

Reputation: 189

Getting the features names form selectKbest

I used Scikit learn selectKbest to select the best features, around 500 from 900 of them. as follows where d is the dataframe of all the features.

from sklearn.feature_selection import SelectKBest, chi2, f_classif
X_new = SelectKBest(chi2, k=491).fit_transform(d, label_vs)

when I print X_new it now, it gives me numbers only but I need name of the selected features to use them later on.

I tried things like X_new.dtype.names but I did't got back anything and I tried to convert X_new into data frame but the only columns names I got were

1, 2, 3, 4... 

so is there a way to know what are the names of the selected features?

Upvotes: 3

Views: 5004

Answers (2)

Jibin Mathew
Jibin Mathew

Reputation: 5102

Here is how you could do it, using get_support():

chY = SelectKBest(chi2, k=491)
X_new = chY.fit_transform(d, label_vs)
column_names = [column[0]  for column in zip(d.columns,chY.get_support()) if column[1]]

From @AI_Learning 's answer you could get the column names by:

column_names = d.columns[chY.get_support()]

Upvotes: 3

Venkatachalam
Venkatachalam

Reputation: 16966

You can use the .get_support() param of feature_selection, to get the feature names from your initial dataframe.

feature_selector = SelectKBest(chi2, k=491)
d.columns[feature_selector.get_support()]

Working example:

from sklearn.datasets import load_digits
import pandas as pd
from sklearn.feature_selection import SelectKBest, chi2
X, y = load_digits(return_X_y=True)
df = pd.DataFrame(X, columns= ['feaure %s'%i for i in range(X.shape[1])])

feature_selector = SelectKBest(chi2, k=20)

X_new = feature_selector.fit_transform(df, y)
X_new.shape

df.columns[feature_selector.get_support()]

Output:

Index(['feaure 5', 'feaure 6', 'feaure 13', 'feaure 19', 'feaure 20', 'feaure 21', 'feaure 26', 'feaure 28', 'feaure 30', 'feaure 33', 'feaure 34', 'feaure 41', 'feaure 42', 'feaure 43', 'feaure 44', 'feaure 46', 'feaure 54', 'feaure 58', 'feaure 61', 'feaure 62'], dtype='object')

Upvotes: 2

Related Questions