A replacement for `subset()` for a list of data.frames

Question

Function foo1 can subset (using subset()) a list of data.frames by one or more requested variables (e.g., by = ESL == 1 or by == ESL == 1 & type == 4).

However, I'm aware of the danger of using subset() in R. Thus, I wonder in foo1 below, what I can use instead of subset() to get the same output?

foo1 <- function(data, by){

  s <- substitute(by)
  L <- split(data, data$study.name) ; L[[1]] <- NULL

  lapply(L, function(x) do.call("subset", list(x, s))) ## What to use instead of `subset`
                                                       ## to get the same output?
}

# EXAMPLE OF USE:
D <- read.csv("https://raw.githubusercontent.com/izeh/i/master/k.csv", header=TRUE) # DATA
foo1(D, ESL == 1)

Roland · Accepted Answer

You can compute on the language. Building on my answer to "Working with substitute after $ sign in R":

foo1 <- function(data, by){

  s <- substitute(by)
  L <- split(data, data$study.name) ; L[[1]] <- NULL

  E <- quote(x$a)
  E[[3]] <- s[[2]]
  s[[2]] <- E

  eval(bquote(lapply(L, function(x) x[.(s),])))
}

foo1(D, ESL == 1)

This gets more complex for arbitrary subset expressions. You'd need a recursive function that crawls the parse tree and inserts the calls to $ at the right places.

Personally, I'd just use package data.table where this is easier because you don't need $, i.e., you can just do eval(bquote(lapply(L, function(x) setDT(x)[.(s),]))) without changing s. OTOH, I wouldn't do this at all. There is really no reason to split before subsetting.

A replacement for `subset()` for a list of data.frames

Answers (2)

Related Questions