Reputation: 121
I am given this dataset:
https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data
I have to get the best feature based on the fact that it has the largest information gain. I was doing it manually. But is there a way that I could get it calculated using sklearn or any other library?
Just for the reference I was writing this code:
false_count=0.0;
true_count=0.0;
total=0.0;
for x in range(0, len(y_train)):
if y_train[x]==2:
false_count=false_count+1;
total=total+1;
else:
true_count=true_count+1
total=total+1
Entropy = -(true_count/total)*(math.log((true_count/total))/math.log(2))-(false_count/total)*(math.log((false_count/total))/math.log(2))
Upvotes: 0
Views: 89
Reputation: 1173
if you want to calculate entropy loss, sklearn has a function metrics.log_loss
, official documents:
usage eg:
log_loss(Y_Truth, Y_predicted, normalize=True)
Upvotes: 0
Reputation: 7610
There is a page in the Scikit-Learn docs which explains all the resources available in the library for feature selection.
I understand by your dataset that you have a classification problem. That means that the chi square stat may be useful for feature selection.
Upvotes: 1