Reputation: 241
This is super simple but I'm learning about decision trees and the ID3 algorithm. I found a website that's very helpful and I was following everything about entropy and information gain until I got to
I don't understand how the entropy for each individual attribute (sunny, windy, rainy) is calculated--specifically, how p-sub-i is calculated. It seems different than the way it is calculated for Entropy(S). Can anyone explain the process behind this calculation?
Upvotes: 1
Views: 4268
Reputation: 31
Calc proportion that sunny represents on set S, i.e., |sunnyInstances| / |S| = 3/10 = 0.3.
Apply the entropy formula considering only sunny entropy. Theres 3 sunny instances divided into 2 classes being 2 sunny related with Tennis and 1 related to Cinema. So the entropy formula for sunny gets something like this: -2/3 log2(2/3) - 1/3 log2(1/3) = 0.918
And so on.
Upvotes: 3
Reputation: 380
To split a node into two different child nodes, one method consists splitting the node according to the variable that can maximise your information gain.
When you reach a pure leaf node, the information gain equals 0 (because you can't gain any information by splitting a node containing only one variable - logic
).
In your example Entropy(S) = 1.571
is your current entropy - the one you have before splitting. Let's call it HBase
.
Then you compute the entropy depending on several splittable parameters.
To get your Information Gain, you substract the entropy of your child nodes to HBase
-> gain = Hbase - child1NumRows/numOfRows*entropyChild1 - child2NumRows/numOfRows*entropyChild2
def GetEntropy(dataSet):
results = ResultsCounts(dataSet)
h = 0.0 #h => entropy
for i in results.keys():
p = float(results[i]) / NbRows(dataSet)
h = h - p * math.log2(p)
return h
def GetInformationGain(dataSet, currentH, child1, child2):
p = float(NbRows(child1))/NbRows(dataSet)
gain = currentH - p*GetEntropy(child1) - (1 - p)*GetEntropy(child2)
return gain
The objective is to get the best of all Information Gains!
Hope this helps! :)
Upvotes: 1