Reputation: 1287
I am trying to use a variable that has the column names of a data.table
that is the outcome of a non-equi join
.
If I use .SDcols
to define the column names it throws an error on the prefixed column names.
How do I access these new columns that are an outome of the join?
See a reprex below.
library(data.table)
set.seed(14)
dt1 <- data.table(id =c(1:10), height = rpois(10,170),weight = rpois(10,50))
dt2 <- data.table(id =c(101:110), height = rpois(10,170),weight = rpois(10,50))
# this works
dt1[dt2,on = .(height<height,weight>weight),nomatch=0,.(x.height,i.height)]
#> x.height i.height
#> 1: 167 179
#> 2: 161 207
#> 3: 167 207
#> 4: 186 207
#> 5: 183 207
#> 6: 161 174
#> 7: 167 174
#> 8: 161 168
#> 9: 167 168
#> 10: 167 175
#> 11: 161 165
#> 12: 161 173
#> 13: 171 173
#> 14: 170 173
#> 15: 167 173
#> 16: 169 173
#> 17: 165 173
#> 18: 167 173
# this **does not** work
colnames=c("x.height","i.height")
dt1[dt2,.SD,on = .(height<height,weight>weight),nomatch=0,.SDcols = colnames]
#> Error in `[.data.table`(dt1, dt2, .SD, on = .(height < height, weight > : Some items of .SDcols are not column names: [x.height, i.height]
Created on 2022-07-06 by the reprex package (v2.0.1)
Upvotes: 1
Views: 486
Reputation: 6489
The two approaches that you mentioned do different things. The error is normal and expected.
In your first approach dt1[dt2,on = .(height<height,weight>weight),nomatch=0,.(x.height,i.height)]
, you are selecting the columns x.height,i.height
on the joined dataset (which contains x.height
and i.height
). This in not the case in your second approach (dt1[dt2,.SD,on = .(height<height,weight>weight),nomatch=0,.SDcols = colnames]
). Here is the definition of .SDcols as from the documentation (see help(data.table)
).
Specifies the columns of x to be included in the special symbol .SD which stands for Subset of data.table.
where x corresponds to x in x[i, j, by,...].
In your example, x correspond to dt1, which is not the output of the join but the left data.table in the join. So column names in .SDcols
that are not in colnames(dt1)
would return an error.
Also note that the prefixes x.
and i.
are useful when selecting columns in j
and not when specifying columns in .SDcols
(see argument j
in help(data.table)
).
When i is a data.table, the columns of i can be referred to in j by using the prefix i., e.g., X[Y, .(val, i.val)]. Here val refers to X's column and i.val Y's.
Columns of x can now be referred to using the prefix x. and is particularly useful during joining to refer to x's join columns as they are otherwise masked by i's. For example, X[Y, .(x.a-i.a, b), on="a"]
As an answer to your question How do I access these new columns that are an outome of the join?, One way is to proceed as @Sweepy Dodo did. Another way would be to use the dev version of data.table (1.14.3), which has a programming interface:
# get dev version
update.dev.pkg()
colnames = list("x.height", "i.height") # note the use of 'list' and not 'c'
dt1[dt2, .j, on = .(height<height,weight>weight),nomatch=0, env=list(.j=colnames)]
x.height i.height
<int> <int>
1: 167 179
2: 161 207
3: 167 207
4: 186 207
5: 183 207
6: 161 174
7: 167 174
8: 161 168
9: 167 168
10: 167 175
11: 161 165
12: 161 173
13: 171 173
14: 170 173
15: 167 173
16: 169 173
17: 165 173
18: 167 173
Upvotes: 1
Reputation: 1863
Try this:
colnames = c("x.height","i.height")
dt1[dt2
, on = .(height < height, weight > weight)
, nomatch = 0
, mget(colnames)
]
Upvotes: 2