Reputation: 109
I'm doing the exercises of Introduction to Data Mining, and got stuck on following questions about decision tree:
The question asks me to calculate generalization error rate by using optimistic and pessimistic approaches, and the answers are 0.3 and 0.5 respectively. They are totally different from my answers 0.5 and 0.7. From my calculation, instances 3, 7, 8, 9, 10 are misclassifications. I have searched many documentations on Google, and all of them didn't explain why and just showed that 3 / 10 = 0.3. Please tell me what's the mistake I made, Thanks!
Upvotes: 3
Views: 12105
Reputation: 11
Your answer is right. It is '+' iff( not A && not B) || (A && not C)
Upvotes: 1
Reputation: 21
I think your answers are right, the solution manual's answer is wrong, and you've made an error while reproducing the tree here - in my copy of the book, the leaf node labels read, from left to right, +, -, +, -. Your tree, with leaf nodes +, -, -, +, does lead to 30% and 50% for the optimistic and pessimistic error estimates, respectively.
Using leaf nodes +, -, +, -, the errors are indeed 50% and 70%.
Upvotes: 2
Reputation: 66805
You got this wrong, missclassified are:
Your decision tree is:
thus for example for 3:
A=0 B=1 C=0 class=+, and your DT returns - as A=0 and B=1
Upvotes: 0