Reputation: 60004
Suppose I have a data.table
a <- data.table(id=c(1,1,2,2,3),a=21:25,b=11:15,key="id")
I can add new columns like this:
a[, sa := sum(a), by="id"]
a[, sb := sum(b), by="id"]
> a
id a b sa sb
1: 1 21 11 43 23
2: 1 22 12 43 23
3: 2 23 13 47 27
4: 2 24 14 47 27
5: 3 25 15 25 15
However, suppose that I have column names instead:
for (n in c("a","b")) {
s <- paste0("s",n)
a[, s := sum(n), by="id", with=FALSE] # ERROR: invalid 'type' (character) of argument
}
what do I do?
Upvotes: 12
Views: 2642
Reputation: 7113
..
data.table
also supports the ..
syntax to "look up a level", obviating the need for with=FALSE
in most cases, e.g. dt[ , ..n1]
and dt[ , ..n2]
in the below
have a look at with
in ? data.table
:
dt <- data.table(id=1:5,a=21:25,b=11:15,key="id")
dt[, n3 := dt[ , n1, with = FALSE ] * dt[ , n2, with = FALSE ], with = FALSE ]
EDIT:
Or you just change the colnames forth and back:
dt <- data.table(id=1:5,a=21:25,b=11:15,key="id")
dt[ , dt.names["n3"] := 1L, with = FALSE ]
dt.names <- c( n1 = "a", n2 = "b", n3 = "c" )
setnames( dt, dt.names, names(dt.names) )
dt[ , n3 := n1 * n2, by = "id" ]
setnames( dt, names(dt.names), dt.names )
which works together with by.
Upvotes: 2
Reputation: 115382
Here is an approach that does the call mangling and avoids any overhead with .SD
# a helper function
makeCall <- function(x,fun) bquote(.(fun)(.(x)))
# the columns you wish to sum (apply function to)
cols <- c('a','b')
new.cols <- paste0('s',cols)
# create named list of names
name.cols <- setNames(sapply(cols,as.name), new.cols)
# create the call
my_call <- as.call(c(as.name(':='), lapply(name.cols, makeCall, fun = as.name('sum'))))
(a[, eval(my_call), by = 'id'])
# id a b sa sb
# 1: 1 21 11 43 23
# 2: 1 22 12 43 23
# 3: 2 23 13 47 27
# 4: 2 24 14 47 27
# 5: 3 25 15 25 15
Upvotes: 0
Reputation: 49448
You can also do this:
a <- data.table(id=c(1,1,2,2,3),a=21:25,b=11:15,key="id")
a[, c("sa", "sb") := lapply(.SD, sum), by = id]
Or slightly more generally:
cols.to.sum = c("a", "b")
a[, paste0("s", cols.to.sum) := lapply(.SD, sum), by = id, .SDcols = cols.to.sum]
Upvotes: 11
Reputation: 59592
This is similar to :
How to generate a linear combination of variables and update table using data.table in a loop call?
but you want to combine this with by=
too, so set()
isn't flexible enough. That's a deliberate design design and set()
is unlikely to change in that regard.
I sometimes use the EVAL
helper at the end of that answer.
https://stackoverflow.com/a/20808573/403310
Some wince at that approach but I just think of it like constructing a dynamic SQL statement, which is quite common practice. The EVAL
approach gives ultimate flexibility without head scratching about eval()
and quote()
. To see the dynamic query that's been constructed (to check it) you can add a print
inside your EVAL
helper function.
However, in this simple example you can wrap the LHS of :=
with brackets to tell data.table
to lookup the value (clearer than with=FALSE
), and the RHS needs a get()
.
for (n in c("a","b")) {
s <- paste0("s",n)
a[, (s) := sum(get(n)), by="id"]
}
Upvotes: 8