Boern
Boern

Reputation: 7752

R caret: Maximizing sensitivity for manually defined positive class for training (classification),

Short Version:

Is there a way to instruct caret to train a regression-model

  1. Using a user defined label as "positive class label"?
  2. Optimize the model for sensitivity during training (instead of ROC)?

Long Version:

I have a dataframe

> feature1 <-                 c(1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0)
> feature2 <-                 c(1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1)
> feature3 <-                 c(0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0)
> TARGET <- factor(make.names(c(1,0,1,1,0,0,1,0,1,1,1,0,1,0,0,0,1,0,1,1)))
> df <- data.frame(feature1, feature2, feature3, TARGET)

And model training is implemented like

> ctrl <- trainControl(
+     method="repeatedcv",
+     repeats = 2)
> 
> tuneGrid <- expand.grid(k = c(2,5,7))
> 
> tune <- train(
+     TARGET ~ .,
+     metric = '???',
+     maximize = TRUE,
+     data = df,
+     method = "knn", 
+     trControl = ctrl, 
+     preProcess = c("center","scale"), 
+     tuneGrid = tuneGrid
+ )
> sclasses <- predict(tune, newdata = df)
> df$PREDICTION <- make.names(factor(sclasses), unique = FALSE, allow_ = TRUE)

I want to maximize the sensitivity = precision = A / ( A + C )

enter image description here

Where Event (in the image) should be in my case X1 = action taken. But caret uses X0 = no action taken.

I can set the positive class for my confusion matrix by using the positive argument like

> confusionMatrix(df$PREDICTION, df$TARGET, positive = "X1")

But is there any way to set this while training (maximizing sensitivity)?

I already checked if there is another metric fitting my need, but I wasn't able to find one in the documentation. Do I have to implement my own summaryFunction for trainControl?

Thanks!

Upvotes: 8

Views: 3288

Answers (2)

Jash Shah
Jash Shah

Reputation: 2164

I have written a function that makes more intuitive sense for me i.e. where the positive class is the second level (result of levels(TARGET)[2]) and is thus used to calculate the sensitivity.

mySummary <- function(data, lev = NULL, model = NULL){

  lvls <- levels(data$obs)

  if (length(lvls) > 2) 
    stop(paste("Your outcome has", length(lvls), "levels. The twoClassSummary() function isn't appropriate."))

  caret:::requireNamespaceQuietStop("ModelMetrics")

  if (!all(levels(data[, "pred"]) == lvls)) 
    stop("levels of observed and predicted data do not match")

  data$y = as.numeric(data$obs == lvls[2])

  rocAUC <- ModelMetrics::auc(ifelse(data$obs == lvls[1], 
                                     0, 
                                     1), 
                              data[, lvls[2]])
  out <- c(rocAUC, 
           sensitivity(data[, "pred"], data[, "obs"], lvls[2]), 
           specificity(data[, "pred"], data[, "obs"], lvls[1]))

  names(out) <- c("ROC", "Sens", "Spec")

  out

} 

Upvotes: 0

Bart VdW
Bart VdW

Reputation: 438

As far as I know, there is no direct way to specify this in the training (I have been searching for this myself for a while now). However, I found a workaround: you can just reorder the levels of the target variable in the dataframe. As the training algorithm will take the first encountered level as the positive class by default, this solves your problem. Just add this simple line of code and that does the trick:

TARGET <- factor(make.names(c(1,0,1,1,0,0,1,0,1,1,1,0,1,0,0,0,1,0,1,1)))
TARGET <- relevel(TARGET, "X1")

Upvotes: 12

Related Questions