Reputation: 1311
How can I calculate the AUC value for a ranger model ? Ranger is a fast implementation of randomForest algorithm in R. I'm using the following code to build the ranger model for classification purposes, and get the prediction from the model:
#Build the model using ranger() function
ranger.model <- ranger(formula, data = data_train, importance = 'impurity',
write.forest = TRUE, num.trees = 3000, mtry = sqrt(length(currentComb)),
classification = TRUE)
#get the prediction for the ranger model
pred.data <- predict(ranger.model, dat = data_test,)
table(pred.data$predictions)
But I dont know how to calculate the AUC value
Any idea ?
Upvotes: 3
Views: 3110
Reputation: 13691
The key to computing AUC is having a way to rank your test samples from "Most likely to be positive" to "Least likely to be positive". Modify your training call to include probability = TRUE
. pred.data$predictions
should now be a matrix of class probabilities. Make note of the column that corresponds to your "positive" class. This column provides the ranking we need to compute AUC.
To actually compute AUC, we will use Equation (3) from Hand and Till, 2001. We can implement this equation as follows:
## An AUC estimate that doesn't require explicit construction of an ROC curve
auc <- function( scores, lbls )
{
stopifnot( length(scores) == length(lbls) )
jp <- which( lbls > 0 ); np <- length( jp )
jn <- which( lbls <= 0); nn <- length( jn )
s0 <- sum( rank(scores)[jp] )
(s0 - np*(np+1) / 2) / (np*nn)
}
where scores
would be the column of pred.data$predictions
that corresponds to the positive class, and lbls
are the corresponding test labels encoded as a binary vector (1
for positive, 0
or -1
for negative).
Upvotes: 3