Lazarus Thurston
Lazarus Thurston

Reputation: 1287

.SDcols does on work on x.colname and i.colname prefixes

I am trying to use a variable that has the column names of a data.table that is the outcome of a non-equi join.

If I use .SDcols to define the column names it throws an error on the prefixed column names.

How do I access these new columns that are an outome of the join?

See a reprex below.

library(data.table)
set.seed(14)
dt1 <- data.table(id =c(1:10), height = rpois(10,170),weight = rpois(10,50))
dt2 <- data.table(id =c(101:110), height = rpois(10,170),weight = rpois(10,50))
# this works
dt1[dt2,on = .(height<height,weight>weight),nomatch=0,.(x.height,i.height)]
#>     x.height i.height
#>  1:      167      179
#>  2:      161      207
#>  3:      167      207
#>  4:      186      207
#>  5:      183      207
#>  6:      161      174
#>  7:      167      174
#>  8:      161      168
#>  9:      167      168
#> 10:      167      175
#> 11:      161      165
#> 12:      161      173
#> 13:      171      173
#> 14:      170      173
#> 15:      167      173
#> 16:      169      173
#> 17:      165      173
#> 18:      167      173

# this **does not** work
colnames=c("x.height","i.height")
dt1[dt2,.SD,on = .(height<height,weight>weight),nomatch=0,.SDcols = colnames]
#> Error in `[.data.table`(dt1, dt2, .SD, on = .(height < height, weight > : Some items of .SDcols are not column names: [x.height, i.height]

Created on 2022-07-06 by the reprex package (v2.0.1)

Upvotes: 1

Views: 486

Answers (2)

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6489

The two approaches that you mentioned do different things. The error is normal and expected. In your first approach dt1[dt2,on = .(height<height,weight>weight),nomatch=0,.(x.height,i.height)], you are selecting the columns x.height,i.height on the joined dataset (which contains x.height and i.height). This in not the case in your second approach (dt1[dt2,.SD,on = .(height<height,weight>weight),nomatch=0,.SDcols = colnames]). Here is the definition of .SDcols as from the documentation (see help(data.table)).

Specifies the columns of x to be included in the special symbol .SD which stands for Subset of data.table.

where x corresponds to x in x[i, j, by,...].

In your example, x correspond to dt1, which is not the output of the join but the left data.table in the join. So column names in .SDcols that are not in colnames(dt1) would return an error.

Also note that the prefixes x. and i. are useful when selecting columns in j and not when specifying columns in .SDcols (see argument j in help(data.table)).

When i is a data.table, the columns of i can be referred to in j by using the prefix i., e.g., X[Y, .(val, i.val)]. Here val refers to X's column and i.val Y's.

Columns of x can now be referred to using the prefix x. and is particularly useful during joining to refer to x's join columns as they are otherwise masked by i's. For example, X[Y, .(x.a-i.a, b), on="a"]

As an answer to your question How do I access these new columns that are an outome of the join?, One way is to proceed as @Sweepy Dodo did. Another way would be to use the dev version of data.table (1.14.3), which has a programming interface:

# get dev version
update.dev.pkg()

colnames = list("x.height", "i.height")      # note the use of 'list' and not 'c'
dt1[dt2, .j, on = .(height<height,weight>weight),nomatch=0, env=list(.j=colnames)]

    x.height i.height
       <int>    <int>
 1:      167      179
 2:      161      207
 3:      167      207
 4:      186      207
 5:      183      207
 6:      161      174
 7:      167      174
 8:      161      168
 9:      167      168
10:      167      175
11:      161      165
12:      161      173
13:      171      173
14:      170      173
15:      167      173
16:      169      173
17:      165      173
18:      167      173

Upvotes: 1

Sweepy Dodo
Sweepy Dodo

Reputation: 1863

Try this:

colnames = c("x.height","i.height")

dt1[dt2
    , on = .(height < height, weight > weight)
    , nomatch = 0
    , mget(colnames)
    ]

Upvotes: 2

Related Questions