Adrian
Adrian

Reputation: 9803

R: how to calculate sensitivity and specificity of rpart tree

library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
                    RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
                    Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
                    Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
                                      levels=c("very inactive", "inactive", "active", "very active"),
                                      ordered=TRUE),
                    Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)
prp(mytree, type = 4, extra = 101, leaf.round = 0, fallen.leaves = TRUE, 
    varlen = 0, tweak = 1.2)

enter image description here

Then by using printcp I can see the cross validation results

> printcp(mytree)

Classification tree:
rpart(formula = Fraud ~ RearEnd + Whiplash + Activity, data = train, 
    method = "class", minsplit = 2, minbucket = 1, cp = -1)

Variables actually used in tree construction:
[1] Activity RearEnd  Whiplash

Root node error: 5/10 = 0.5

n= 10 

    CP nsplit rel error xerror xstd
1  0.6      0       1.0    2.0  0.0
2  0.2      1       0.4    0.4  0.3
3 -1.0      3       0.0    0.4  0.3

So the root node error is 0.5, and from my understand that is the misclassification error. But I'm having trouble with calculating the sensitivity (proportion of true positives) and specificity (proportion of true negatives). How can I calculate those based on the rpart output?

(The above example is from http://gormanalysis.com/decision-trees-in-r-using-rpart/)

Upvotes: 4

Views: 5035

Answers (2)

Sally
Sally

Reputation: 91

The root node error is the missclassification error at the root of the tree. Therefore the missclassification error before adding any nodes. Not the missclassification error of the final tree.

Upvotes: 0

LyzandeR
LyzandeR

Reputation: 37889

You can use the caret package to do so:

Data:

library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
                    RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
                    Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
                    Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
                                      levels=c("very inactive", "inactive", "active", "very active"),
                                      ordered=TRUE),
                    Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)

Solution

library(caret)

#calculate predictions
preds <- predict(mytree, train)

#calculate sensitivity
> sensitivity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1

#calculate specificity
> specificity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1

Both sensitivity and specificity take the predictions as the first argument and the observed values (the response variable i.e. train$Fraud) as the second argument.

According to the documentation both the predictions and the observed values need to be fed to the functions as factors that have the same levels.

Both specificity and sensitivity in this case are 1 since the predictions are 100% accurate.

Upvotes: 2

Related Questions