Reputation: 7517
Function foo1
can subset (using subset()
) a list of data.frames by one or more requested variables (e.g., by = ESL == 1
or by == ESL == 1 & type == 4
).
However, I'm aware of the danger of using subset()
in R. Thus, I wonder in foo1
below, what I can use instead of subset()
to get the same output?
foo1 <- function(data, by){
s <- substitute(by)
L <- split(data, data$study.name) ; L[[1]] <- NULL
lapply(L, function(x) do.call("subset", list(x, s))) ## What to use instead of `subset`
## to get the same output?
}
# EXAMPLE OF USE:
D <- read.csv("https://raw.githubusercontent.com/izeh/i/master/k.csv", header=TRUE) # DATA
foo1(D, ESL == 1)
Upvotes: 1
Views: 104
Reputation: 132706
You can compute on the language. Building on my answer to "Working with substitute after $
sign in R":
foo1 <- function(data, by){
s <- substitute(by)
L <- split(data, data$study.name) ; L[[1]] <- NULL
E <- quote(x$a)
E[[3]] <- s[[2]]
s[[2]] <- E
eval(bquote(lapply(L, function(x) x[.(s),])))
}
foo1(D, ESL == 1)
This gets more complex for arbitrary subset expressions. You'd need a recursive function that crawls the parse tree and inserts the calls to $
at the right places.
Personally, I'd just use package data.table where this is easier because you don't need $
, i.e., you can just do eval(bquote(lapply(L, function(x) setDT(x)[.(s),])))
without changing s
. OTOH, I wouldn't do this at all. There is really no reason to split before subsetting.
Upvotes: 1
Reputation: 226192
I would guess (based on general knowledge and a quick skim of the answers to the "dangers of subset()" question) that the dangers of subset
are intrinsic dangers of non-standard evaluation (NSE); if you want to be able to pass a generic expression and have it evaluated within the context of a data frame, I think you're more or less stuck with subset()
or something like it.
If you were willing to use a more constrained set of expressions such as var
, vals
(looking for cases where the variable indexed by string var
took on values in the vector vals
) you could use
d[d[[var]] %in% vals, ]
Here var
is a string, not a naked R symbol ("cyl"
rather than cyl
); it's unambiguous that you want to extract it from the data frame.
You could extend this to a vector of variables and a list of vectors of values:
for (i in seq_along(vars)) {
d <- d[d[[vars[i]]] %in% vals[[i]], ]
}
but if you want the full flexibility of expressions (e.g. to be able to use either ESL == 1 & type == 4
or ESL == 1 | type == 4
, or inequalities based on numeric variables) I think you're stuck with an NSE-based approach.
It's conceivable that the new-ish "tidy eval" machinery (in the rlang
package, documented in some detail here) would give you a slightly more principled approach, but I don't think the dangers will completely go away.
Upvotes: 1