user2947767
user2947767

Reputation: 1311

How to calculate the AUC value for a ranger RF model?

How can I calculate the AUC value for a ranger model ? Ranger is a fast implementation of randomForest algorithm in R. I'm using the following code to build the ranger model for classification purposes, and get the prediction from the model:

#Build the model using ranger() function
ranger.model <- ranger(formula, data = data_train, importance = 'impurity',   
write.forest = TRUE, num.trees = 3000, mtry = sqrt(length(currentComb)), 
classification = TRUE)
#get the prediction for the ranger model
pred.data <- predict(ranger.model, dat = data_test,)
table(pred.data$predictions)

But I dont know how to calculate the AUC value

Any idea ?

Upvotes: 3

Views: 3110

Answers (1)

Artem Sokolov
Artem Sokolov

Reputation: 13691

The key to computing AUC is having a way to rank your test samples from "Most likely to be positive" to "Least likely to be positive". Modify your training call to include probability = TRUE. pred.data$predictions should now be a matrix of class probabilities. Make note of the column that corresponds to your "positive" class. This column provides the ranking we need to compute AUC.

To actually compute AUC, we will use Equation (3) from Hand and Till, 2001. We can implement this equation as follows:

## An AUC estimate that doesn't require explicit construction of an ROC curve
auc <- function( scores, lbls )
{
  stopifnot( length(scores) == length(lbls) )
  jp <- which( lbls > 0 ); np <- length( jp )
  jn <- which( lbls <= 0); nn <- length( jn )
  s0 <- sum( rank(scores)[jp] )
  (s0 - np*(np+1) / 2) / (np*nn)
}   

where scores would be the column of pred.data$predictions that corresponds to the positive class, and lbls are the corresponding test labels encoded as a binary vector (1 for positive, 0 or -1 for negative).

Upvotes: 3

Related Questions