Reputation: 1307
I would like to split my data into k training folds without using stratification (but optionally with shuffling). How would one achieve this in R? All threads that I found so far relate to stratified k-fold which is not what I want. Possibly related to this CrossValidated thread. The equivalent to this in Python would be to use sklearn.model_selection.KFold
Upvotes: 1
Views: 158
Reputation: 1593
You can try my lightweight package splitTools.
Depending on your use case, you would do something like
library(splitTools)
# invert = TRUE to get out-of-sample folds
insample_folds <- create_folds(
1:nrow(iris), k = 10, type = "basic", seed = 1, shuffle = TRUE
)
# Loop over insample folds to get RMSE per fold
rmses <- numeric(0)
for (fold in insample_folds) {
fit <- lm(Sepal.Length ~ ., data = iris[fold, ])
valid_errors <- iris$Sepal.Length[-fold] - predict(fit, iris[-fold, ])
rmses <- c(rmses, sqrt(mean(valid_errors^2)))
}
mean(rmses) # 0.312
Upvotes: 1