Hannah Montanna
Hannah Montanna

Reputation: 121

Get the best feature which gives the largest information gain

I am given this dataset:

https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data

I have to get the best feature based on the fact that it has the largest information gain. I was doing it manually. But is there a way that I could get it calculated using sklearn or any other library?

Just for the reference I was writing this code:

false_count=0.0;
true_count=0.0;
total=0.0;
for x in range(0, len(y_train)):
    if y_train[x]==2:

              false_count=false_count+1;
              total=total+1;

    else: 

             true_count=true_count+1
             total=total+1

Entropy = -(true_count/total)*(math.log((true_count/total))/math.log(2))-(false_count/total)*(math.log((false_count/total))/math.log(2))

Upvotes: 0

Views: 89

Answers (2)

chrisckwong821
chrisckwong821

Reputation: 1173

if you want to calculate entropy loss, sklearn has a function metrics.log_loss, official documents: usage eg:

log_loss(Y_Truth, Y_predicted, normalize=True)

Upvotes: 0

Jundiaius
Jundiaius

Reputation: 7610

There is a page in the Scikit-Learn docs which explains all the resources available in the library for feature selection.

I understand by your dataset that you have a classification problem. That means that the chi square stat may be useful for feature selection.

Upvotes: 1

Related Questions