baxx
baxx

Reputation: 4695

How to get the actual selected features from sklearn SelectKBest

Given the following data:

import pandas as pd
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
import io


df = pd.read_csv(
    io.StringIO(
        "noise_0,x0,x1,y\n1.0322600657764203,10.354468012163927,7.655143584899129,168.06121374114608\n4.478935261759052,8.786243147880384,6.244283164157256,156.570749155167\n9.085955030930956,10.450548129254543,8.084427493431185,152.10261405911672\n2.9361414837367947,10.869778308219216,9.165630427431644,129.72126680171317\n2.877753385863487,11.236593954599316,5.7987616455741575,55.294961794556315\n1.3002857211827767,9.111226379916955,10.289447419679227,308.7475968288771\n0.19366957870297075,9.753313270715008,9.803181441185592,163.337342478704\n6.788355329398909,9.752270042969856,9.004988677803736,271.9442757290742\n2.1162811600005904,8.67161845864426,9.801711898528824,158.09622149503954\n2.655466593722262,8.830913103331573,6.632544281651334,316.23912914041557\n"
    )
)

which looks as:

    noise_0         x0         x1           y
0  1.032260  10.354468   7.655144  168.061214
1  4.478935   8.786243   6.244283  156.570749
2  9.085955  10.450548   8.084427  152.102614
3  2.936141  10.869778   9.165630  129.721267
4  2.877753  11.236594   5.798762   55.294962
5  1.300286   9.111226  10.289447  308.747597
6  0.193670   9.753313   9.803181  163.337342
7  6.788355   9.752270   9.004989  271.944276
8  2.116281   8.671618   9.801712  158.096221
9  2.655467   8.830913   6.632544  316.239129

and has correlation matrix


|         |   noise_0 |        x0 |        x1 |         y |
|:--------|----------:|----------:|----------:|----------:|
| noise_0 |  1        |  0.159642 | -0.208966 | -0.02006  |
| x0      |  0.159642 |  1        | -0.197431 | -0.620964 |
| x1      | -0.208966 | -0.197431 |  1        |  0.304241 |
| y       | -0.02006  | -0.620964 |  0.304241 |  1        |

I'm interested how I can find the variable names x0,x1 from sklearns feature selection.

When I try the following:

X_new = SelectKBest(f_regression, k=2).fit(df.drop("y", axis=1), df["y"])

I'm expecting this to select x1, x2, but am not sure how to determine which features were actually selected by it.

Upvotes: 3

Views: 1663

Answers (1)

yahia
yahia

Reputation: 101

SelectKBest provides a get_support() method that can show you which features were selected.

Rearrange the code to save the SelectKBest instance:

selector = SelectKBest(f_regression, k=2)
X = df.drop("y", axis=1)
X_new = selector.fit(X, df["y"])

Now, running selector.get_support() will give us:

[False,  True,  True]

We can then use selector.get_support() to mask the columns of X:

X.columns.values[selector.get_support()]

for a final output of:

['x0', 'x1']

Upvotes: 4

Related Questions