Lucas Verra
Lucas Verra

Reputation: 73

sklearn Selectkbest, how to create a dict of {feature1:score,feature2:score...}

I'm trying to see clearer on a selectkbest process. I wish to see the score of ALL the features (selected or not) on a dict in order to graph it later like this :

enter image description here

So far I've tried

print selector.scores_

where I receive

[ 18.57570327 9.34670079 10.07245453 24.46765405 6.23420114 4.20497086 8.86672154 0.21705893 11.59554766 25.09754153 7.2427304 21.06000171 5.31257143 0.1641645 1.69882435]

or

print sorted(selector.scores_, reverse=True)[:5]

or

selector = SelectKBest(f_classif, k=5)
selectedFeatures = selector.fit(features, labels)
selected_features_list = [features_list[i+1] for i in selectedFeatures.get_support(indices=True)]
features_list = features_list[:1]+selected_features_list
print 'New feature_list after SelectKbest is\n',features_list,'\n'
print sorted(selector.scores_, reverse=True)[:5]

where I can know the features selected, I can know the 5 best features, but cannot be sure if the indexing is the same.

New feature_list after SelectKbest is
['poi', 'salary', 'total_stock_value', 'deferred_income', 'exercised_stock_options', 'bonus'] 

[25.097541528735491, 24.467654047526398, 21.060001707536571, 18.575703268041785, 11.595547659730601]

What I am looking for is :

    [[best_feature,best_score],
[2nbest_feature,2nbest_score],
[3rdbest_feature,3rdbest_score],
and so on with all features]

Any idea ?

Upvotes: 0

Views: 478

Answers (2)

Lucas Verra
Lucas Verra

Reputation: 73

Answering my own question

For dict creation :

all_scores_dict = {}
for i, score in enumerate(selector.scores_):
    all_scores_dict[features_list[support[i]+1]] = score

for ordering it (representation is now a list of tuples)

import operator
sorted_dict_scores = sorted(all_scores_dict.items(), key=operator.itemgetter(1),reverse = True)

which gives you

[('exercised_stock_options', 25.097541528735491),
 ('total_stock_value', 24.467654047526398),
 ('bonus', 21.060001707536571),
 ('salary', 18.575703268041785),
 ('deferred_income', 11.595547659730601),
 ('long_term_incentive', 10.072454529369441),
 ('restricted_stock', 9.3467007910514877),
 ('total_payments', 8.8667215371077717),
 ('loan_advances', 7.2427303965360181),
 ('expenses', 6.2342011405067401),
 ('sum_of_unclassified', 5.31257142710212),
 ('other', 4.204970858301416),
 ('to_messages', 1.6988243485808501),
 ('deferral_payments', 0.2170589303395084),
 ('from_messages', 0.16416449823428736)]

Upvotes: 0

piman314
piman314

Reputation: 5355

A word of warning, a dictionary is an unordered object, so it doesn't make sense to do it this way, but I have included the final step for you anyway

First of all you combine your scores and names into one object:

combined = zip(feature_names, scores)

Then you need to sort your object based on the scores:

combined.sort(reverse=True, key= lambda x: x[1])

Then just get your data into a dictionary:

dict((x, y) for x, y in combined)

Upvotes: 2

Related Questions