michalk
michalk

Reputation: 1587

Root node error in classification tree model

I'm struggling with understanding output of tree classification in rpart. I don't understand how 'root node error' is calculated(one of the output of printcp function). I couldn't find it definition also in rpart package description.

On example I loaded titanic data:

library(titanic)
library(rpart)

tt<-titanic_train
table(tt$Survived)

So we have 549 people who survived and 342 people who died. Total 891 people.

fit<-rpart(Survived ~Pclass+Sex+Age+ SibSp+Parch+Fare+Embarked , data=tt)
printcp(dend) 

Gives result:

Regression tree:
rpart(formula = Survived ~ Pclass + Sex + Age + SibSp + Parch + 
    Fare + Embarked, data = tt)

Variables actually used in tree construction:
[1] Age    Fare   Pclass Sex    SibSp 

Root node error: 210.73/891 = 0.23651

n= 891 

        CP nsplit rel error  xerror     xstd
1 0.295231      0   1.00000 1.00538 0.016124
2 0.073942      1   0.70477 0.70896 0.033228
3 0.027124      2   0.63083 0.63570 0.031752
4 0.026299      3   0.60370 0.62105 0.032815
5 0.023849      4   0.57740 0.61154 0.032884
6 0.021091      5   0.55356 0.58294 0.032127
7 0.010000      6   0.53246 0.57097 0.032402

Here root node error mean misclassification error at the beginning before adding any nodes, am I right? So if I assume that everyone survived I will be wrong in 342 cases out of 891, so root node error should be 342/891. And in the output I have 210.73/891.

I would be grateful with helping me understand what 210.73 means in Root node error and how it was calculated on example this titanic data. I was searching for it all day and can't find any explanation.

Thank you in advance for help.

Upvotes: 3

Views: 7920

Answers (1)

Sarah Grogan
Sarah Grogan

Reputation: 137

Root node error is the percent of correctly sorted records at the first (root) splitting node.

For more information see Understanding the Outputs of the Decision Tree Tool.

Upvotes: 1

Related Questions