Reputation: 633
I'm trying to split my list of data frames into some kind of sub groups like a nested list or several lists. The split should be based on the number of rows per data frame, so data frames with the same number of rows should end up in the same list.
full_list <- list(
df1 = replicate(10, sample(0:1, 10, replace = TRUE)),
df2 = replicate(10, sample(0:1, 15, replace = TRUE)),
df3 = replicate(10, sample(0:1, 20, replace = TRUE)),
df4 = replicate(10, sample(0:1, 10, replace = TRUE))
)
There are now two data frames with nrow() == 10
, so they should end up in their own list or sublist
I tried something like this, but I don't think split
is applicable for lists:
sublist <- lapply(full_list, function(x) split(full_list, f = nrow(x)))
BTW: The greater goal is to split all data frames into a training and a test data set for machine learning with the function below. sample
will be used to create the subsets, but I want the same sample_vector
for data frames of same length. Therefore, I want to split the full list into sub lists beforehand. Afterwards I will put all data frames together again for further processing (kind of split - apply - combine). Just mentioning if I might be overcomplicating things here.
# function to split data frames in each sub list into train and test data frames
counter <- 0
train_test_list <- list()
for (x_table in sublist) {
counter <- counter + 1
current_name <- paste(names(sublist)[counter], sep = "_")
sample_vector <- sample.int(n = nrow(x_table),
size = floor(0.8 * nrow(x_table)), replace = FALSE)
train_set <- x_table[sample_vector, ]
test_set <- x_table[-sample_vector, ]
train_test_list[[current_name]] <- list(
train_set = train_set, test_set = test_set,
table_name = names(sublist)[counter]
)
}
# combine all lists with test and train pairs back into one list
full_train_test_list <- c(train_test_list1, train_test_list2, train_test_list3, ...)
Upvotes: 3
Views: 370
Reputation: 887991
We can get the number of rows with sapply
and split
based on that info
new_list <- split(full_list, sapply(full_list, nrow))
str(new_list)
#List of 3
# $ 10:List of 2
# ..$ df1: int [1:10, 1:10] 1 0 0 1 1 0 1 0 0 1 ...
# ..$ df4: int [1:10, 1:10] 1 0 1 1 1 0 0 0 1 1 ...
# $ 15:List of 1
# ..$ df2: int [1:15, 1:10] 0 1 1 0 0 0 0 0 0 1 ...
# $ 20:List of 1
# ..$ df3: int [1:20, 1:10] 1 1 0 1 0 1 1 1 0 1 ...
As it is a nested list
, we can do the processing in the inner list
by calling lapply
inside the first lapply
traintestlst <- lapply(new_list, function(sublst) lapply(sublst, function(x_table) {
sample_vector <- sample.int(n = nrow(x_table),
size = floor(0.8 * nrow(x_table)), replace = FALSE)
train_set <- x_table[sample_vector, ]
test_set <- x_table[-sample_vector, ]
list(train_set = train_set, test_set = test_set)
})
)
-checking the output
traintestlst[[1]]$df1
#$train_set
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 1 1 0 1 0 0 1 1 1 0
#[2,] 1 0 1 1 1 0 0 0 1 0
#[3,] 0 1 0 0 1 1 0 1 1 0
#[4,] 1 1 0 1 0 0 1 0 0 1
#[5,] 0 0 0 1 0 0 1 0 1 0
#[6,] 0 1 1 0 1 0 1 0 1 0
#[7,] 1 0 1 1 0 0 0 0 0 1
#[8,] 0 1 0 0 0 1 0 0 1 0
#$test_set
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 0 0 0 0 0 1 0 1 0 1
#[2,] 1 0 0 0 0 0 0 1 1 0
Upvotes: 4