Reputation: 4500
data:
children pet salary
0 4.0 cat 90
1 6.0 dog 24
2 3.0 dog 44
3 3.0 fish 27
4 2.0 cat 32
5 3.0 dog 59
6 5.0 cat 36
7 4.0 fish 27
code:
from sklearn_pandas import DataFrameMapper, cross_val_score
from sklearn.feature_selection import SelectKBest, chi2
mapper_fs = DataFrameMapper([(['children','salary'], SelectKBest(chi2, k=2))])
mapper_fs.fit_transform(data[['children','salary']], data['pet'])
result:
array([[ 90.],
[ 24.],
[ 44.],
[ 27.],
[ 32.],
[ 59.],
[ 36.],
[ 27.]])
I am trying to code sklearn feature selection on a test pandas data, but I am not able to intrepret the results. I took the peice of code from the official documentation. PLease suggest me on how to intrepret the results. As in, if I have n columns in a pandas data frame how to select the best k from all the columns in the dataframe.
Upvotes: 0
Views: 4319
Reputation: 5921
If you are trying to select the k-best features of your data train set, what I am sure about is that you are doing it the wrong way for many reasons among which :
DataFrameMapper
is completely uselessk=2
best features of your data set when you have only 2 featuresdata['pet']
before giving it to the fit
functionHere how you should do it :
from sklearn.feature_selection import SelectKBest, chi2
X = # your dataframe with n columns
y = # target values - encoded if categorical
# instanciate your selector
selector = SelectKBest(chi2, k=...) # k < n, try something like int(round(n/10.))
# Fit it to your data
selector.fit(X, y) # returns the selector itself but fitted
# You can transform your data using the fit_transform method if you want
# Now at this step you have reduce the dimensionality of your feature space. You can now perform a classification
Piece of advice :
When you don't know how something works, try to read the documentation or find some tutorials online. I have never seen a feature selection online using a DataFrameMapper
except yours ...
Upvotes: 1