Reputation: 9803
library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
levels=c("very inactive", "inactive", "active", "very active"),
ordered=TRUE),
Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)
prp(mytree, type = 4, extra = 101, leaf.round = 0, fallen.leaves = TRUE,
varlen = 0, tweak = 1.2)
Then by using printcp
I can see the cross validation results
> printcp(mytree)
Classification tree:
rpart(formula = Fraud ~ RearEnd + Whiplash + Activity, data = train,
method = "class", minsplit = 2, minbucket = 1, cp = -1)
Variables actually used in tree construction:
[1] Activity RearEnd Whiplash
Root node error: 5/10 = 0.5
n= 10
CP nsplit rel error xerror xstd
1 0.6 0 1.0 2.0 0.0
2 0.2 1 0.4 0.4 0.3
3 -1.0 3 0.0 0.4 0.3
So the root node error is 0.5, and from my understand that is the misclassification error. But I'm having trouble with calculating the sensitivity (proportion of true positives) and specificity (proportion of true negatives). How can I calculate those based on the rpart
output?
(The above example is from http://gormanalysis.com/decision-trees-in-r-using-rpart/)
Upvotes: 4
Views: 5035
Reputation: 91
The root node error is the missclassification error at the root of the tree. Therefore the missclassification error before adding any nodes. Not the missclassification error of the final tree.
Upvotes: 0
Reputation: 37889
You can use the caret
package to do so:
Data:
library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
levels=c("very inactive", "inactive", "active", "very active"),
ordered=TRUE),
Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)
Solution
library(caret)
#calculate predictions
preds <- predict(mytree, train)
#calculate sensitivity
> sensitivity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1
#calculate specificity
> specificity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1
Both sensitivity
and specificity
take the predictions as the first argument and the observed values (the response variable i.e. train$Fraud
) as the second argument.
According to the documentation both the predictions and the observed values need to be fed to the functions as factors that have the same levels.
Both specificity and sensitivity in this case are 1 since the predictions are 100% accurate.
Upvotes: 2