Jake
Jake

Reputation: 187

Splitting a list of dataframes into multiple lists based on a factor in each dataframe

I have a list of data frames with a time series of (x, y) coordinates. Each data frame also has a specific variable - trial_option - which I want to use to split my list of data frames into multiple smaller lists. Each smaller list will contain all the data frames with one trial_option factor.

df1 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("A", 10))
df2 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("A", 10))
df3 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("B", 10))
df4 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("B", 10))
df5 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("C", 10))
df6 <- data.frame(x = runif(10, -10, 10), y = runif(10, -10, 10), trial_option = rep("C", 10))
mylist <- list(df1 = df1, df2 = df2, df3 = df3, df4 = df4, df5 = df5, df6 = df6)

So I want to split mylist into 3 smaller lists: mylistA, mylistB, mylistC. I thought I could use small_list <- lapply(list, subset, trial_option == A) and doing that for each trial_option but that did not return what I wanted. I also feel like repeating that for each trial_option would be tedious and not good practice. I haven't been able to find a suitable answer by googling yet.

Also, once I have these subset lists, I'll be doing some data wrangling and I then want to combine these smaller lists all back into a big list. Each subset of trial_option data frames needs to have separate data wrangling done, hence why I want to split the master list.

Any help is appreciated.

Upvotes: 0

Views: 1353

Answers (2)

Parfait
Parfait

Reputation: 107652

Whenever you need to perform processing on data frame splits, consider by the object-oriented wrapper of tapply. While similar to split in creating named list of subset dfs by one or more factors, by allows you to process each subset df further without any lapply or for loop afterwards.

mylist <- list(df1 = df1, df2 = df2, df3 = df3, df4 = df4, df5 = df5, df6 = df6)

complete_df <- do.call(rbind, mylist)

# NAMED LIST OF DFS (NAMES ARE UNIQUE VALUES OF trial_option: A, B, C)
by_list <- by(complete_df, complete_df$trial_option, FUN=function(d) {    
    # DATA WRANGLING WHERE PARAMETER, d, IS SUBSETTED DATAFRAME
    d ...
    # RETURN A DATAFRAME AFTER PROCESSING
    return(new_d)
})

# ROW BIND ALL DF ELEMENTS (ASSUMES EACH HAVE SAME colnames() AND ncol())
new_complete_df <- do.call(rbind, by_list)   

Upvotes: 0

Aleh
Aleh

Reputation: 826

All data frames can be combined into one and then splited on trial_optin

df <- rbind(df1, df2, df3, df4, df5, df6)
split(x = df, f = df$trial_option)

Upvotes: 1

Related Questions