Keshav M
Keshav M

Reputation: 1349

Extract both training and testing AUROC from caret 10 fold CV

Say I am doing classification like below:

library(mlbench)
data(Sonar)

library(caret)
set.seed(998)

my_data <- Sonar

fitControl <-
  trainControl(
    method = "cv",
    number = 10,
    classProbs = T,
    savePredictions = T,
    summaryFunction = twoClassSummary
  )


model <- train(
  Class ~ .,
  data = my_data,
  method = "xgbTree",
  trControl = fitControl,
  metric = "ROC"
)

For each of the 10 folds, 10% of the data is being used for validation. For the best parameters that caret determines, I can use getTrainPerf(model) to find the mean validation AUC over all 10 folds or model$resample to get the individual values of AUC for each fold. However, I cannot get the AUC if the training data was put back into the same model. It would be nice if I can get the individual AUC values for each training set.

How can this information be extracted? I want to make sure my model is not overfit (data set I am working with is very small).

Thanks!

Upvotes: 1

Views: 2206

Answers (1)

missuse
missuse

Reputation: 19716

As requested in the comments here is a custom function to evaluate the cross validation test error. I am not sure if it can be extracted from the caret train object.

After running the caret train extract the folds for the best tune:

library(tidyverse)
model$bestTune %>%
  left_join(model$pred) %>%
  select(rowIndex, Resample) %>%
  mutate(Resample = as.numeric(gsub(".*(\\d$)", "\\1", Resample)),
         Resample = ifelse(Resample == 0, 10, Resample)) %>%
  arrange(rowIndex) -> resamples

Construct a cross validation function that will use the same folds as caret did:

library(xgboost)
train <- my_data[,!names(my_data)%in% "Class"]
label <- as.numeric(my_data$Class) - 1

test_auc <- lapply(1:10, function(x){
  model <- xgboost(data = data.matrix(train[resamples[,2] != x,]),
                   label = label[resamples[,2] != x],
                   nrounds = model$bestTune$nrounds,
                   max_depth = model$bestTune$max_depth,
                   gamma = model$bestTune$gamma,
                   colsample_bytree = model$bestTune$colsample_bytree,
                   objective = "binary:logistic",
                   eval_metric= "auc" ,
                   print_every_n = 50)
  preds_train <- predict(model, data.matrix(train[resamples[,2] != x,]))
  preds_test <- predict(model, data.matrix(train[resamples[,2] == x,]))
  auc_train <- pROC::auc(pROC::roc(response = label[resamples[,2] != x], predictor = preds_train, levels = c(0, 1)))
  auc_test <- pROC::auc(pROC::roc(response = label[resamples[,2] == x], predictor = preds_test, levels = c(0, 1)))
  return(data.frame(fold = unique(resamples[resamples[,2] == x, 2]), auc_train, auc_test))
  })

do.call(rbind, test_auc)
#output
   fold auc_train  auc_test
1     1         1 0.9909091
2     2         1 0.9797980
3     3         1 0.9090909
4     4         1 0.9629630
5     5         1 0.9363636
6     6         1 0.9363636
7     7         1 0.9181818
8     8         1 0.9636364
9     9         1 0.9818182
10   10         1 0.8888889

arrange(model$resample, Resample)
#output
         ROC      Sens      Spec Resample
1  0.9909091 1.0000000 0.8000000   Fold01
2  0.9898990 0.9090909 0.8888889   Fold02
3  0.9909091 0.9090909 1.0000000   Fold03
4  0.9444444 0.8333333 0.8888889   Fold04
5  0.9545455 0.9090909 0.8000000   Fold05
6  0.9272727 1.0000000 0.7000000   Fold06
7  0.9181818 0.9090909 0.9000000   Fold07
8  0.9454545 0.9090909 0.8000000   Fold08
9  0.9909091 0.9090909 0.9000000   Fold09
10 0.8888889 0.9090909 0.7777778   Fold10

Why the test fold AUC from my function and caret are not the same I can not say. I am fairly sure the same parameters and folds were used. I can assume it has to do with the random seed. When I check the auc of caret test predictions I get the same output as caret:

model$bestTune %>%
  left_join(model$pred) %>%
  arrange(rowIndex) %>%
  select(M, Resample, obs) %>%
  mutate(Resample = as.numeric(gsub(".*(\\d$)", "\\1", Resample)),
                             Resample = ifelse(Resample == 0, 10, Resample),
         obs = as.numeric(obs) - 1) %>%
  group_by(Resample) %>%
  do(auc = as.vector(pROC::auc(pROC::roc(response = .$obs, predictor = .$M)))) %>%
  unnest()
#output
   Resample   auc
      <dbl> <dbl>
 1     1.00 0.991
 2     2.00 0.990
 3     3.00 0.991
 4     4.00 0.944
 5     5.00 0.955
 6     6.00 0.927
 7     7.00 0.918
 8     8.00 0.945
 9     9.00 0.991
10    10.0  0.889

But again I emphasize test error will tell you little and you should rely on train error. If you would like to bring the two closer than consider fiddling with gamma, alpha and lambda parameters.

With a small data set I would still try to split train : test = 80 : 20 and use that independent test set to verify if the CV error is close to test error.

Upvotes: 1

Related Questions