Stephen Done
Stephen Done

Reputation: 502

How to pass a list of columns to data.table where some are predetermined

Pass character vectors and column names to data.table as a list of columns?

I want to be able to produce a subset of columns in R using data.table in a way that I can determine some of them earlier on and pass the predetermined list on as a character vector, then combine with a static list of columns.

That is, given this:

a <- 1:4
b <- 5:8
c <- c('aa','bb','cc','dd')
e <- 1:4

z <- data.table(a,b,c,e)

I want to do this:

z[, list(a,b)]

Which produces this output:

   a b
1: 1 5
2: 2 6
3: 3 7
4: 4 8

But I want to do it in some way similar to this (which works, almost):

cols <- "b"
z[, list(get(cols), a)]

Results: Note that it doesn't return the name of the column stored in cols

   V1 a
1:  5 1
2:  6 2
3:  7 3
4:  8 4

but I need to do it with more than one element of cols (which does not work):

cols <- c('a', 'b')
z[, list(mget(cols), c)]

The above produces the following error:

Error: value for ‘a’ not found

I think my problem lies with scoping and which environments mget is looking in, but I can't figure out what exactly I am doing wrong. Also, how do I preserve the column titles?

Upvotes: 5

Views: 2204

Answers (3)

Stephen Done
Stephen Done

Reputation: 502

Combining a variable with column names with hard-coded column names in data.table

Given z and cols from the example above:

To combine a list of column names in a variable col with other hard coded column name c, we combine them in a new character vector c(col, 'c') in the call to data.table. We can refer to cols from within j (the second argument within []) by using the "up-one-level" notation ..:

z[, c(..cols, 'c')]

Thank you to @thelatemail for providing the base to the solution above.

Upvotes: 2

mnel
mnel

Reputation: 115505

Attempting to mix standard and non-standard evaluation within a single call will probably end in tears / frustration / obfusticated code.

There are a number of options in data.table

  1. Use .. notation to "look up one level" to find the vector of column names

    cols <- c('a','b')
    z[, ..cols]
    
  2. Use .SDcols

    z[, .SD, .SDcols = cols]
    

But if you really want to combine the two ways of referencing, then you can use something like (introducing another option, with=FALSE, which allows more general expressions for column names than a simple vector)

ll <- function(char=NULL,uneval=NULL){ 
        Call <- match.call()
        cols <- lapply(Call$uneval,as.character)
         unlist(c(char,cols))}
z[, ll(cols,c), with=FALSE]
#    a b  c
# 1: 1 5 aa
# 2: 2 6 bb
# 3: 3 7 cc
# 4: 4 8 dd

z[, ll(char=cols), with=FALSE]
#    a b
# 1: 1 5
# 2: 2 6
# 3: 3 7
# 4: 4 8

z[, ll(uneval=c), with=FALSE]
#     c
# 1: aa
# 2: bb
# 3: cc
# 4: dd

Upvotes: 4

eddi
eddi

Reputation: 49448

Here are two (pretty much equivalent) options. One using lapply:

z[, c(lapply(cols, get), list(c))]
#   V1 V2 V3
#1:  1  5 aa
#2:  2  6 bb
#3:  3  7 cc
#4:  4  8 dd

And one using mget:

z[, c(mget(cols, inherits = TRUE), c = list(c))]
#   a b  c
#1: 1 5 aa
#2: 2 6 bb
#3: 3 7 cc
#4: 4 8 dd

Note that get returns a vector which loses the information about column name (and there isn't much you can do about it besides manually adding it back in), while mget returns a named list.

Upvotes: 5

Related Questions