Reputation: 187
I have a list of data frames with a time series of (x, y) coordinates. Each data frame also has a specific variable - trial_option
- which I want to use to split my list of data frames into multiple smaller lists. Each smaller list will contain all the data frames with one trial_option
factor.
df1 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("A", 10))
df2 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("A", 10))
df3 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("B", 10))
df4 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("B", 10))
df5 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("C", 10))
df6 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("C", 10))
mylist <- list(df1 = df1, df2 = df2, df3 = df3, df4 = df4, df5 = df5, df6 = df6)
So I want to split mylist
into 3 smaller lists: mylistA, mylistB, mylistC
.
I thought I could use small_list <- lapply(list, subset, trial_option == A)
and doing that for each trial_option but that did not return what I wanted. I also feel like repeating that for each trial_option would be tedious and not good practice. I haven't been able to find a suitable answer by googling yet.
Also, once I have these subset lists, I'll be doing some data wrangling and I then want to combine these smaller lists all back into a big list. Each subset of trial_option
data frames needs to have separate data wrangling done, hence why I want to split the master list.
Any help is appreciated.
Upvotes: 0
Views: 1353
Reputation: 107652
Whenever you need to perform processing on data frame splits, consider by
the object-oriented wrapper of tapply
. While similar to split
in creating named list of subset dfs by one or more factors, by
allows you to process each subset df further without any lapply
or for
loop afterwards.
mylist <- list(df1 = df1, df2 = df2, df3 = df3, df4 = df4, df5 = df5, df6 = df6)
complete_df <- do.call(rbind, mylist)
# NAMED LIST OF DFS (NAMES ARE UNIQUE VALUES OF trial_option: A, B, C)
by_list <- by(complete_df, complete_df$trial_option, FUN=function(d) {
# DATA WRANGLING WHERE PARAMETER, d, IS SUBSETTED DATAFRAME
d ...
# RETURN A DATAFRAME AFTER PROCESSING
return(new_d)
})
# ROW BIND ALL DF ELEMENTS (ASSUMES EACH HAVE SAME colnames() AND ncol())
new_complete_df <- do.call(rbind, by_list)
Upvotes: 0
Reputation: 826
All data frames can be combined into one and then splited on trial_optin
df <- rbind(df1, df2, df3, df4, df5, df6)
split(x = df, f = df$trial_option)
Upvotes: 1