Muhammad Kamil
Muhammad Kamil

Reputation: 665

Splitting a list of data frames into multiple training and testing sets in R

I have a list of dataframes:

df1 <- data.frame(a = 1:4, b = 3:6)
df2 <- data.frame(a = c(5,3,4,4), b = c(9,9,1,0))
df_list <- list(df1, df2)

I want to create a new list with df1_testing, df1_training, df2_testing, and df2_training datasets, with a 75-25 split between train and test sets.

Upvotes: 1

Views: 99

Answers (1)

Ma&#235;l
Ma&#235;l

Reputation: 52024

You can do this. You could also change the function to make the probability to split (here 0.75) a parameter.

split2 <- function(df){
  sample <- sample(x = 1:nrow(df), size = floor(.75*nrow(df)), replace = F)
  list(test = df[sample,], train = df[-sample,])
}
lapply(df_list, split2)

Which gives:

[[1]]
[[1]]$test
  a b
1 1 3
3 3 5
2 2 4

[[1]]$train
  a b
4 4 6


[[2]]
[[2]]$test
  a b
1 5 9
2 3 9
3 4 1

[[2]]$train
  a b
4 4 0

Upvotes: 1

Related Questions