How do I use data.table completely with variables?

Question

I have a question regarding the data.table syntax regarding variables as inputs. For the sake of an example I am using the standard data set used by the data.table intro (https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html).

input <- if (file.exists("flights14.csv")) {
   "flights14.csv"
} else {
  "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
flights <- fread(input)

The usage of variables as inputs is shown very briefly, but not to the extent I need. How do I get the (slightly modified) example from the data.table intro

ans <- flights[carrier == "AA" & month == 6L,
        .(mean(arr_delay), mean(dep_delay)),
        by = .(origin, dest, month)]
ans

completely with arbitrary variables, e.g.

var1 = c("carrier", "month")
var2 = c("AA",6L)
var3 = c(mean,mean)
var4 = c("arr_delay", "dep_delay")
var5 = c("origin","dest","month")

?

I want to get the same output as with ans but the result should only be dependent on var1 to var5 and operators like by, .() or ==. I have been trying using various combinations of ..vari or with=F and even get(vari) but not getting the results I want.

Ronak Shah · Accepted Answer

Here's a way to do this :

library(data.table)
#filter the rows based where var1 and var2 is true
tmp <- flights[rowSums(sweep(flights[, ..var1], 2, var2, `==`)) == length(var1)]
#apply var3 functions on var4 columns group by var5 columns
ans1 <- tmp[, Map(function(x, y) x(y), var3, .SD), .SDcols = var4, by = var5]
#Check the answer
identical(ans, ans1)
#[1] TRUE

How do I use data.table completely with variables?

Answers (1)

Related Questions