Nils Gudat
Nils Gudat

Reputation: 13800

Access data.table columns with strings

Apologies for a question that probably makes it obvious that I usually work in Python/pandas, but I'm stuck with this. How do I select a data.table column using a string?

dt$"string"
dt$as.name("string")
dt$get("string")

I'm sure this is super simple, but I'm not getting it. Any help is greatly appreciated!


-------------- EDITED TO ADD ----------------------

After some of the helpful comments and tips below, I think I've narrowed down the problem a bit and have a reproducible example. Consider:

dt = data.table(ID = c("a","a","a","b","b","b"), col1=rnorm(6), col2=rnorm(6)*100)

And assume we want to assign the values in col2 to col1. As I've learned below, the data.table syntax for this would be dt[,col1:=col2], clean and simple. The problems start when one (or both) of the variables in the j argument are strings. I found the following:

dt[, "col1":=col2] works as expected

dt[, "col1":="col2"] fails as expected (tries to assign the character col2 to the double vector col1

dt[, "col1":=get("col2")] works as expected

dt[, get("col1")] returns col1 as expected

but: dt[, get("col1"):=col2] or any other assignment fails.

Some context: the reason for doing this is that I'm constructing strings in a loop, to access a larger number of columns that are all named colname_colnumber, i.e. I loop over colname and colnumber to then access column paste0(colname,colnumber).

Upvotes: 12

Views: 15417

Answers (3)

MichaelChirico
MichaelChirico

Reputation: 34703

I'll add that if you want a bunch of columns, you may wish to use something like:

dt[ , c("id", paste0("col", 1:10)), with = FALSE]

As @Arun adds below, other options for getting multiple columns are:

dt[ , mget(c("id", paste0("col", 1:5)))]

and

dt[ , .SD, .SDcols = c("id", paste0("col", 1:5))]

In recent versions of data.table (e.g. current CRAN) you can also use the "up-a-level" notation like:

keep_cols = c('id', paste0('col', 1:5))
dt[ , ..keep_cols]

For reference, mget seems to be very slow; .SDcols is fastest, but competitive with with = FALSE; I personally find all to be useful/most natural in different situations.

Here's a simple benchmark:

NN <- 10000L
MM <- 100L
mm <- 10L

DT = data.table(id = 1:NN)
DT[ , paste0("col", 1:MM) := lapply(integer(MM), function(x) runif(NN))]

sdcols = function(...) DT[ , .SD, .SDcols = paste0("col", sample(MM, size = mm))]
m.get = function(...) DT[ , mget(paste0("col", sample(MM, size=mm)))]
withF = function(...) DT[ , paste0("col", sample(MM, size = mm)), with = FALSE]

library(microbenchmark)
microbenchmark(times=100L, sdcols(), m.get(), withF())
# Unit: microseconds
#      expr      min        lq      mean    median        uq      max neval cld
#  sdcols()  780.201  810.4350  865.3564  827.4970  853.4875 2354.577   100 a  
#   m.get() 2792.293 2864.1225 3052.3872 2899.9370 3031.9260 4831.963   100   c
#   withF()  897.822  927.7105 1005.3166  945.9495  981.0580 2600.445   100  b 

Upvotes: 7

Evertvdw
Evertvdw

Reputation: 947

You can do assignments without get but using brackets:

dt[, ("col1"):=col2]

instead of:

dt[, get("col1"):=col2]

See for more explanation: Select / assign to data.table variables which names are stored in a character vector

Upvotes: 5

Andrie
Andrie

Reputation: 179408

You can use get() as the j argument using single brackets:

library(data.table)
dt <- data.table(iris)
dt[, get("Species")]

The result:

[1] setosa     setosa     setosa     setosa     setosa     setosa .....

You can also use a string directly inside the double bracket operator, like this:

dt[["Species"]]

Upvotes: 16

Related Questions