polTo
polTo

Reputation: 290

R data.table keyby multiple columns when chaining

I want to create a new data table with just two columns and join it with another table using these two columns as keys. I could do this:

a.dt <- data.table(x = 1:3, y = 2:4)
setkey(a.dt,x,y)
key(a.dt)
# [1] "x" "y"
# ...(join)

But I don't want to use setkey - I want to chain immediately into the join, i.e. after creating the table I want to keyby both new variables. For a reason that is not clear to me I'm getting this error:

a.dt <- data.table(x = 1:3, y = 2:4)[, .SD, keyby = .(x,y)]
# Error in jsub[[ii]] : attempt to select less than one element

This is my ugly workaround

a.dt <- data.table(x = 1:3, y = 2:4, z = 3:5)[, .SD, keyby = .(x,y)][,z := NULL]
key(a.dt)
# [1] "x" "y"

Can somebody explain why I can't keyby both variables? Am I doing something fundamentally silly?

Edit: for completeness adding session info

sessionInfo()
# R version 3.2.1 (2015-06-18)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 8 x64 (build 9200)

# locale:
# [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] data.table_1.9.4 plyr_1.8.2       foreign_0.8-63  

# loaded via a namespace (and not attached):
# [1] magrittr_1.5   tools_3.2.1    reshape2_1.4.1 Rcpp_0.12.0    stringi_0.4-1  stringr_1.0.0  chron_2.3-45  

Upvotes: 3

Views: 5905

Answers (1)

Arun
Arun

Reputation: 118859

You can key while creating a data.table using the key argument:

require(data.table) #v1.9.6
DT2 = data.table(x=3:4, y=4:5, w=5:6)
DT1 = data.table(x=1:3, y=2:4, z=7:9, key=c("x", "y"))[DT2]

Alternatively, from v1.9.6 on you can use the on= argument to specify the columns on which the join should be performed.

DT2 = data.table(a=3:4, b=4:5, w=5:6)
DT1 = data.table(x=1:3, y=2:4, z=7:9)[DT2, on=c(x="a", y="b")]

Another way is to use setDT() on a list along with the key argument.

DT2 = data.table(a=3:4, b=4:5, w=5:6)
DT1 = setDT(list(x=1:3, y=2:4, z=7:9), key=c("x", "y"))[DT2]

dt[, .SD, keyby=.(x, y)] returns an empty data.table for me (as it should). Not sure which version of data.table you're using. This is because .SD by default gets assigned all columns except the columns mentioned in by=. Since there are two columns, and you've provided both to group by, .SD is empty. Therefore an empty data.table is returned. If you really want this behaviour, you'll need to explicitly set .SDcols as follows:

dt[, .SD, keyby=.(x, y), .SDcols=c("x", "y")]
# or from v1.9.6 we can also do
dt[, .SD, keyby=.(x, y), .SDcols=x:y]

But note that this'll return x,y twice.

Upvotes: 7

Related Questions