Reputation: 119
I need to do an four-fold nested repeated cross validation to train a model. I wrote the following code, which has the inner cross-validation, but now I'm struggling to create the outer.
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
## repeated five times
repeats = 5,
savePredictions = TRUE,
classProbs = TRUE,
summaryFunction = twoClassSummary)
model_SVM_P <- train(Group ~ ., data = training_set,
method = "svmPoly",
trControl = fitControl,
verbose = FALSE,
tuneLength = 5)
I made an attempt to solve the problem:
ntrain=length(training_set)
train.ext=createFolds(training_set,k=4,returnTrain=TRUE)
test.ext=lapply(train.ext,function(x) (1:ntrain)[-x])
for (i in 1:4){
model_SVM_P <- train(Group ~ ., data = training_set[train.ext[[i]]],
method = "svmRadial",
trControl = fitControl,
verbose = FALSE,
tuneLength = 5)
}
But it didn't worked. How can I do this outer loop?
Upvotes: 2
Views: 1654
Reputation: 7141
The rsample
package has implemented the outer loop in the nested_cv()
function, see documentation.
To evaluate the models trained by nested_cv, have a look at this vignette which shows where the "heavylifting" is done:
# `object` is an `rsplit` object in `results$inner_resamples`
summarize_tune_results <- function(object) {
# Return row-bound tibble that has the 25 bootstrap results
map_df(object$splits, tune_over_cost) %>%
# For each value of the tuning parameter, compute the
# average RMSE which is the inner bootstrap estimate.
group_by(cost) %>%
summarize(mean_RMSE = mean(RMSE, na.rm = TRUE),
n = length(RMSE),
.groups = "drop")
}
tuning_results <- map(results$inner_resamples, summarize_tune_results)
This code applies the tune_over_cost
function on every hyperparameter and split (or fold) of the training data which is here called "assessment data".
Please check out the vignette for more useful code including parallelization.
Upvotes: 1