Reputation: 426
I'm trying to implement ID3 or C4.5 algorithm.
According to ID3 algorithm, the information gain is calculated as follows:
For example: training data like this:
credit age label
normal young yes
normal old yes
bad old no
excellent middle yes
The IG of credit should like this: IG(credit) = H(D) - P(credit==normal)H(D|credit==normal) - P(credit==bad)H(D|credit==bad) - P(credit==excellent)H(D|credit==excellent)
When I choose the credit as the best feature to split, in the following procedure, I will not consider the attribute "credit" again.
However: I also see some one implemented like this:
IG(credit=normal) = H(D) - P(credit==normal)H(D|credit==normal) - P(credit ~= normal)H(D|credit ~= normal)
When I choose credit == normal as the best feature to split, in the following procedure, I will consider the attribute "credit" again, like credit == "bad".
The resulting tree of different IG calculation procudure, one is non-binary tree, the other is the binary tree.
My question is whether two trees are equivalent? When I do testing on two trees, the results will always be the same? Or one is better than other? Or hard to say which is better, just depends on the data?
Upvotes: 0
Views: 1122
Reputation: 4490
As you have mentioned, one tree will perform multiway split the other binary split. The 2 trees are definitely NOT equivalent, hence the test results will also not be the same.But the accuracy in both cases could be in a similar range. To suggest you on the last 2 questions regarding which model is better depends on your data.
Upvotes: 0