Sklearn feature selection in pandas

Question

data:

   children     pet    salary
0    4.0        cat     90
1    6.0        dog     24
2    3.0        dog     44
3    3.0        fish    27
4    2.0        cat     32
5    3.0        dog     59
6    5.0        cat     36
7    4.0        fish    27

code:

 from sklearn_pandas import DataFrameMapper, cross_val_score
 from sklearn.feature_selection import SelectKBest, chi2
 mapper_fs = DataFrameMapper([(['children','salary'], SelectKBest(chi2, k=2))])
 mapper_fs.fit_transform(data[['children','salary']], data['pet'])

result:

 array([[ 90.],
   [ 24.],
   [ 44.],
   [ 27.],
   [ 32.],
   [ 59.],
   [ 36.],
   [ 27.]])

I am trying to code sklearn feature selection on a test pandas data, but I am not able to intrepret the results. I took the peice of code from the official documentation. PLease suggest me on how to intrepret the results. As in, if I have n columns in a pandas data frame how to select the best k from all the columns in the dataframe.

MMF · Accepted Answer

If you are trying to select the k-best features of your data train set, what I am sure about is that you are doing it the wrong way for many reasons among which :

DataFrameMapper is completely useless
You want to get the k=2 best features of your data set when you have only 2 features
You need to encode your categorical features data['pet'] before giving it to the fit function

Here how you should do it :

from sklearn.feature_selection import SelectKBest, chi2

X = # your dataframe with n columns
y = # target values - encoded if categorical
# instanciate your selector
selector = SelectKBest(chi2, k=...) # k < n, try something like int(round(n/10.))
# Fit it to your data
selector.fit(X, y) # returns the selector itself but fitted
# You can transform your data using the fit_transform method if you want

# Now at this step you have reduce the dimensionality of your feature space. You can now perform a classification

Piece of advice : When you don't know how something works, try to read the documentation or find some tutorials online. I have never seen a feature selection online using a DataFrameMapper except yours ...

Sklearn feature selection in pandas

Answers (1)

Related Questions