Take 20+ subsets of data?

Question

I have a dataset and would like to take a lot of subsets based on various columns, values, and conditional operators. I think the most desirable output is a list containing all of these subsetted data frames as separate elements in the list. I attempted to do this by building a data frame that contains the subset conditions I would like to use, building a function, then using apply to feed that data frame to the function, but that didn't work. I'm sure there's probably a better method that uses an anonymous function or something like that, but I'm not sure how I would implement that. Below is an example code that should produce 8 subsets of data.

Original dataset, where x1 and x2 are scored on items that won't be used for subsetting and RT and LS are the variables that will be a subset on:

df <- data.frame(x1 = rnorm(100),
                 x2 = rnorm(100),
                 RT = abs(rnorm(100)),
                 LS = sample(1:10, 100, replace = T))

Dataframe containing the conditions for subsetting. E.g., the first subset of data should be any observations with values greater than or equal to 0.5 in the RT column, the second subset should be any observations greater than or equal to 1 in the subset column, etc. There should be 8 subsets, 4 done on the RT variable and 4 done on the LS variable.

subsetConditions <- data.frame(column = rep(c("RT", "LS"), each = 4),
                      operator = rep(c(">=", "<="), each = 4),
                      value = c(0.5, 1, 1.5, 2,
                                9, 8, 7, 6))

And this is the ugly function I wrote to attempt to do this:

subsetFun <- function(x){
  subset(df, eval(parse(text = paste(x))))
}  

subsets <- apply(subsetConditions, 1, subsetFun)

Thanks for any help!

Parfait · Accepted Answer

Consider Map (wrapper to mapply) without any eval + parse. Since ==, <=, >=, and other operators can be used as functions with two arguments where 4 <= 5 can be written as `<=`(4,5) or "<="(4, 5), simply pass arguments elementwise and use get to reference the function by string:

sub_data <- function(col, op, val) {
  df[get(op)(df[[col]], val),]
}

sub_dfs <- with(subsetConditions, Map(sub_data, column, operator, value))

Output

str(sub_dfs)
List of 8
 $ RT:'data.frame': 62 obs. of  4 variables:
  ..$ x1: num [1:62] -1.12 -0.745 -1.377 0.848 1.63 ...
  ..$ x2: num [1:62] -0.257 -2.385 0.805 -0.313 0.662 ...
  ..$ RT: num [1:62] 0.693 1.662 0.731 2.145 0.543 ...
  ..$ LS: int [1:62] 5 5 1 2 9 1 5 9 3 10 ...
 $ RT:'data.frame': 36 obs. of  4 variables:
  ..$ x1: num [1:36] -0.745 0.848 0.908 -0.761 0.74 ...
  ..$ x2: num [1:36] -2.3849 -0.3131 -2.4645 -0.0784 0.8512 ...
  ..$ RT: num [1:36] 1.66 2.15 1.74 1.65 1.13 ...
  ..$ LS: int [1:36] 5 2 1 5 9 10 2 7 1 3 ...
 $ RT:'data.frame': 14 obs. of  4 variables:
  ..$ x1: num [1:14] -0.745 0.848 0.908 -0.761 -1.063 ...
  ..$ x2: num [1:14] -2.3849 -0.3131 -2.4645 -0.0784 -2.9886 ...
  ..$ RT: num [1:14] 1.66 2.15 1.74 1.65 2.63 ...
  ..$ LS: int [1:14] 5 2 1 5 5 6 9 4 8 4 ...
 $ RT:'data.frame': 3 obs. of  4 variables:
  ..$ x1: num [1:3] 0.848 -1.063 0.197
  ..$ x2: num [1:3] -0.313 -2.989 0.709
  ..$ RT: num [1:3] 2.15 2.63 2.05
  ..$ LS: int [1:3] 2 5 6
 $ LS:'data.frame': 92 obs. of  4 variables:
  ..$ x1: num [1:92] -1.12 -0.745 -1.377 0.848 0.612 ...
  ..$ x2: num [1:92] -0.257 -2.385 0.805 -0.313 0.958 ...
  ..$ RT: num [1:92] 0.693 1.662 0.731 2.145 0.489 ...
  ..$ LS: int [1:92] 5 5 1 2 1 9 1 5 9 3 ...
 $ LS:'data.frame': 78 obs. of  4 variables:
  ..$ x1: num [1:78] -1.12 -0.745 -1.377 0.848 0.612 ...
  ..$ x2: num [1:78] -0.257 -2.385 0.805 -0.313 0.958 ...
  ..$ RT: num [1:78] 0.693 1.662 0.731 2.145 0.489 ...
  ..$ LS: int [1:78] 5 5 1 2 1 1 5 3 5 2 ...
 $ LS:'data.frame': 75 obs. of  4 variables:
  ..$ x1: num [1:75] -1.12 -0.745 -1.377 0.848 0.612 ...
  ..$ x2: num [1:75] -0.257 -2.385 0.805 -0.313 0.958 ...
  ..$ RT: num [1:75] 0.693 1.662 0.731 2.145 0.489 ...
  ..$ LS: int [1:75] 5 5 1 2 1 1 5 3 5 2 ...
 $ LS:'data.frame': 62 obs. of  4 variables:
  ..$ x1: num [1:62] -1.12 -0.745 -1.377 0.848 0.612 ...
  ..$ x2: num [1:62] -0.257 -2.385 0.805 -0.313 0.958 ...
  ..$ RT: num [1:62] 0.693 1.662 0.731 2.145 0.489 ...
  ..$ LS: int [1:62] 5 5 1 2 1 1 5 3 5 2 ...

Take 20+ subsets of data?

Answers (2)

Related Questions