Johannes Wiesner
Johannes Wiesner

Reputation: 1307

How to do unstratified (shuffled) k-fold cross validation using R?

I would like to split my data into k training folds without using stratification (but optionally with shuffling). How would one achieve this in R? All threads that I found so far relate to stratified k-fold which is not what I want. Possibly related to this CrossValidated thread. The equivalent to this in Python would be to use sklearn.model_selection.KFold

Upvotes: 1

Views: 158

Answers (1)

Michael M
Michael M

Reputation: 1593

You can try my lightweight package splitTools.

Depending on your use case, you would do something like

library(splitTools)

# invert = TRUE to get out-of-sample folds
insample_folds <- create_folds(
  1:nrow(iris), k = 10, type = "basic", seed = 1, shuffle = TRUE
)

# Loop over insample folds to get RMSE per fold
rmses <- numeric(0)
for (fold in insample_folds) {
  fit <- lm(Sepal.Length ~ ., data = iris[fold, ])
  valid_errors <- iris$Sepal.Length[-fold] - predict(fit, iris[-fold, ])
  rmses <- c(rmses, sqrt(mean(valid_errors^2)))
}

mean(rmses) # 0.312

Upvotes: 1

Related Questions