R: how to calculate sensitivity and specificity of rpart tree

Question

library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
                    RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
                    Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
                    Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
                                      levels=c("very inactive", "inactive", "active", "very active"),
                                      ordered=TRUE),
                    Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)
prp(mytree, type = 4, extra = 101, leaf.round = 0, fallen.leaves = TRUE, 
    varlen = 0, tweak = 1.2)

enter image description here

Then by using printcp I can see the cross validation results

> printcp(mytree)

Classification tree:
rpart(formula = Fraud ~ RearEnd + Whiplash + Activity, data = train, 
    method = "class", minsplit = 2, minbucket = 1, cp = -1)

Variables actually used in tree construction:
[1] Activity RearEnd  Whiplash

Root node error: 5/10 = 0.5

n= 10 

    CP nsplit rel error xerror xstd
1  0.6      0       1.0    2.0  0.0
2  0.2      1       0.4    0.4  0.3
3 -1.0      3       0.0    0.4  0.3

So the root node error is 0.5, and from my understand that is the misclassification error. But I'm having trouble with calculating the sensitivity (proportion of true positives) and specificity (proportion of true negatives). How can I calculate those based on the rpart output?

(The above example is from http://gormanalysis.com/decision-trees-in-r-using-rpart/)

LyzandeR · Accepted Answer

You can use the caret package to do so:

Data:

library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
                    RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
                    Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
                    Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
                                      levels=c("very inactive", "inactive", "active", "very active"),
                                      ordered=TRUE),
                    Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)

Solution

library(caret)

#calculate predictions
preds <- predict(mytree, train)

#calculate sensitivity
> sensitivity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1

#calculate specificity
> specificity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1

Both sensitivity and specificity take the predictions as the first argument and the observed values (the response variable i.e. train$Fraud) as the second argument.

According to the documentation both the predictions and the observed values need to be fed to the functions as factors that have the same levels.

Both specificity and sensitivity in this case are 1 since the predictions are 100% accurate.

R: how to calculate sensitivity and specificity of rpart tree

Answers (2)

Related Questions