Reputation: 3
I have a question regarding the data.table syntax regarding variables as inputs. For the sake of an example I am using the standard data set used by the data.table intro (https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html).
input <- if (file.exists("flights14.csv")) {
"flights14.csv"
} else {
"https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
flights <- fread(input)
The usage of variables as inputs is shown very briefly, but not to the extent I need. How do I get the (slightly modified) example from the data.table intro
ans <- flights[carrier == "AA" & month == 6L,
.(mean(arr_delay), mean(dep_delay)),
by = .(origin, dest, month)]
ans
completely with arbitrary variables, e.g.
var1 = c("carrier", "month")
var2 = c("AA",6L)
var3 = c(mean,mean)
var4 = c("arr_delay", "dep_delay")
var5 = c("origin","dest","month")
?
I want to get the same output as with ans
but the result should only be dependent on var1
to var5
and operators like by
, .()
or ==
.
I have been trying using various combinations of ..vari
or with=F
and even get(vari)
but not getting the results I want.
Upvotes: 0
Views: 89
Reputation: 389235
Here's a way to do this :
library(data.table)
#filter the rows based where var1 and var2 is true
tmp <- flights[rowSums(sweep(flights[, ..var1], 2, var2, `==`)) == length(var1)]
#apply var3 functions on var4 columns group by var5 columns
ans1 <- tmp[, Map(function(x, y) x(y), var3, .SD), .SDcols = var4, by = var5]
#Check the answer
identical(ans, ans1)
#[1] TRUE
Upvotes: 1