Lucas Lazari
Lucas Lazari

Reputation: 119

k-fold nested repeated cross validation in R

I need to do an four-fold nested repeated cross validation to train a model. I wrote the following code, which has the inner cross-validation, but now I'm struggling to create the outer.

fitControl <- trainControl(## 10-fold CV
                           method = "repeatedcv",
                           number = 10,
                           ## repeated five times
                           repeats = 5,
                           savePredictions = TRUE,
                           classProbs = TRUE,
                           summaryFunction = twoClassSummary)

model_SVM_P <- train(Group ~ ., data = training_set, 
                 method = "svmPoly", 
                 trControl = fitControl,
                 verbose = FALSE,
                 tuneLength = 5)

I made an attempt to solve the problem:

ntrain=length(training_set)    
train.ext=createFolds(training_set,k=4,returnTrain=TRUE)
test.ext=lapply(train.ext,function(x) (1:ntrain)[-x])

for (i in 1:4){
    model_SVM_P <- train(Group ~ ., data = training_set[train.ext[[i]]], 
                 method = "svmRadial", 
                 trControl = fitControl,
                 verbose = FALSE,
                 tuneLength = 5) 

    }

But it didn't worked. How can I do this outer loop?

Upvotes: 2

Views: 1654

Answers (1)

Agile Bean
Agile Bean

Reputation: 7141

The rsample package has implemented the outer loop in the nested_cv() function, see documentation.

To evaluate the models trained by nested_cv, have a look at this vignette which shows where the "heavylifting" is done:

# `object` is an `rsplit` object in `results$inner_resamples` 
summarize_tune_results <- function(object) {
  # Return row-bound tibble that has the 25 bootstrap results
  map_df(object$splits, tune_over_cost) %>%
    # For each value of the tuning parameter, compute the 
    # average RMSE which is the inner bootstrap estimate. 
    group_by(cost) %>%
    summarize(mean_RMSE = mean(RMSE, na.rm = TRUE),
              n = length(RMSE),
              .groups = "drop")
}

tuning_results <- map(results$inner_resamples, summarize_tune_results)

This code applies the tune_over_cost function on every hyperparameter and split (or fold) of the training data which is here called "assessment data".

Please check out the vignette for more useful code including parallelization.

Upvotes: 1

Related Questions