Reputation: 2439
I'm using the following code for the calculation of Sensibility, Specificity, NPV and PPV using RandomForest as classifier.
suppressMessages(require(randomForest));
classifier <- randomForest(x.train,y.train,ntree=300,importance=T)
prediction <<- predict(classifier,x.test,type="response")
suppressMessages(require(caret));
accuracyData <- confusionMatrix(prediction,y.test)
In accuracyData I have all the information about the prediction quality (sensitivity, specificity, etc.).
Anyway, I'd like to make this calculations for different thresholds, but I don't see how to specify such value in my code.
Upvotes: 2
Views: 4307
Reputation: 206421
The problem is that when you predict a "response", you are making a dichotomous decision and you are losing information about your uncertainty. At that point a threshold has already been applied to make the decision. If you want to try different thresholds, you should output the probability of a response instead. For example
#sample data
set.seed(15)
x<- matrix(runif(100,0,5), ncol=1)
y<- 3-2*x[,1] + rnorm(100, 2, 2)
y<- factor(ifelse(y>median(y), "A","B"))
x.train<-x[1:50,, drop=F]
y.train<-y[1:50]
x.test<-x[-(1:50),,drop=F]
y.true<-y[-(1:50)]
#fit the model
library(randomForest)
classifier <- randomForest(x.train,y.train,ntree=500,importance=T)
prediction <- predict(classifier,x.test, type="prob")
#calculate performance
library(pROC)
mroc<-roc(y.true, prediction[,1], plot=T)
And then we can calculate the values of interest for different thresholds
coords(mroc, .5, "threshold", ret=c("sensitivity","specificity","ppv","npv"))
# sensitivity specificity ppv npv
# 0.7586207 0.8095238 0.8461538 0.7083333
coords(mroc, .9, "threshold", ret=c("sensitivity","specificity","ppv","npv"))
# sensitivity specificity ppv npv
# 0.9655172 0.6666667 0.8000000 0.9333333
Upvotes: 6