Ola Caster
Ola Caster

Reputation: 88

What is the preferred way to programmatically define multiple new data.table columns?

The FAQ states that the preferred way to add a new column to a data.table when programming is to use quote() and then eval(). But what if I want to add several columns at once? Playing around with this I came up with the following solution:

library(data.table)
DT <- data.table(V1=1:1000,
                 V2=2001:3000)
col.names <- c("V3","V4")
col.specs <- vector("list",2)
col.specs[[1]] <- quote(V1**2)
col.specs[[2]] <- quote((V1+V2)/2)

DT[,c(col.names) := lapply(col.specs,eval,envir=DT)]

which yields the desired result:

> head(DT)
   V1   V2 V3   V4
1:  1 2001  1 1001
2:  2 2002  4 1002
3:  3 2003  9 1003
4:  4 2004 16 1004
5:  5 2005 25 1005
6:  6 2006 36 1006

My question is simply: is this the preferred method? Specifically, can someone think of a way to avoid specifying the environment in the lapply() call? If I leave it out I get:

> DT[,c(col.names) := lapply(col.specs,eval)]
Error in eval(expr, envir, enclos) : object 'V1' not found

It may be no big deal, but at least to me it feels a bit suspicious that the data table does not recognise its own columns. Also, if I add the columns one by one, there is no need to specify the environment:

> DT <- data.table(V1=1:1000,
+                  V2=2001:3000)
> col.names <- c("V3","V4")
> col.specs <- vector("list",2)
> col.specs[[1]] <- quote(V1**2)
> col.specs[[2]] <- quote((V1+V2)/2)
> for (i in 1L:length(col.names)) {
+   DT[,col.names[i] := list(eval(col.specs[[i]]))]
+ }
> head(DT)
   V1   V2 V3   V4
1:  1 2001  1 1001
2:  2 2002  4 1002
3:  3 2003  9 1003
4:  4 2004 16 1004
5:  5 2005 25 1005
6:  6 2006 36 1006

Upvotes: 2

Views: 95

Answers (1)

Frank
Frank

Reputation: 66819

Since things are easier with a single quoted expression...

library(data.table)
DT <- data.table(V1=1:1000, V2=2001:3000)

new_cols = list(
  V3 = quote(V1**2),
  v4 = quote((V1+V2)/2)
)

e = as.call(c(quote(`:=`), new_cols))
DT[, eval(e)]

Then you can freely add to or edit new_cols with the names in close proximity to the exprs.

Sources: Arun, and me citing him before.


Side note. The syntax above is

`:=`(col = v, col2 = v2, ...)

But we should also be able to do

c("col", "col2") := list(v, v2)
# aka
`:=`(c("col", "col2"), list(v, v2))

However, I can't figure out how to do it:

DT <- data.table(V1=1:1000, V2=2001:3000)
e2 = as.expression(list(quote(`:=`), names(new_cols), unname(new_cols)))
# gives an error:
DT[, eval(e2)]

# even though it works when written directly:
DT2[, `:=`(c("V3", "v4"), list(V1^2, (V1 + V2)/2))]

I'd like to know how to do that, though...

Upvotes: 1

Related Questions