How to calculate the generalization error rate of a decision tree

I'm doing the exercises of Introduction to Data Mining, and got stuck on following questions about decision tree:

Training

Testing

Decision tree

The question asks me to calculate generalization error rate by using optimistic and pessimistic approaches, and the answers are 0.3 and 0.5 respectively. They are totally different from my answers 0.5 and 0.7. From my calculation, instances 3, 7, 8, 9, 10 are misclassifications. I have searched many documentations on Google, and all of them didn't explain why and just showed that 3 / 10 = 0.3. Please tell me what's the mistake I made, Thanks!

Upvotes: 3

Answers (3)

Kunal Lalwani

Reputation: 11

Your answer is right. It is '+' iff( not A && not B) || (A && not C)

Upvotes: 1

Lane Christiansen

Reputation: 21

I think your answers are right, the solution manual's answer is wrong, and you've made an error while reproducing the tree here - in my copy of the book, the leaf node labels read, from left to right, +, -, +, -. Your tree, with leaf nodes +, -, -, +, does lead to 30% and 50% for the optimistic and pessimistic error estimates, respectively.

Using leaf nodes +, -, +, -, the errors are indeed 50% and 70%.

Upvotes: 2

lejlot

Reputation: 66805

You got this wrong, missclassified are:

in training: 3, 5, 6
in testing: 12, 13, 14, 15

Your decision tree is:

return + iff (not a and not b) or (a and c)

thus for example for 3:

A=0 B=1 C=0 class=+, and your DT returns - as A=0 and B=1

Upvotes: 0

How to calculate the generalization error rate of a decision tree

Answers (3)

Related Questions