Reputation: 1027
I want to do a cartesian (full outer) join using the fabulous data.table package in R. However, I want unmatched rows to be mentioned as well, i.e. my two data.tables "left" and "right" look like
key | data_left
1 | aaa
2 | bbb
3 | ccc
and
key | data_right
1 | xxx
2 | yyy
The cross join with a key column "key" gives me
key | data_left | data_right
1 | aaa | xxx
2 | bbb | yyy
however, the unmatched row 3 | ccc
is completely missing. Adding the option nomatch=0
(instead of nomatch=NA
) did not help. I want data.table to just fill up the remaining columns with NA so I expect
key | data_left | data_right
1 | aaa | xxx
2 | bbb | yyy
3 | ccc | NA
Any idea what I can do in order to get this to work?
Code sample:
library(data.table)
left = data.table(keyCol = c(1,2,3), data_left = c("aaa", "bbb", "ccc"))
right = data.table(keyCol = c(1,2), data_right = c("xxx", "yyy"))
setkey(left, keyCol)
setkey(right, keyCol)
res0 = left[right, allow.cartesian=TRUE, nomatch=NA]
resNA = left[right, allow.cartesian=TRUE, nomatch=0]
Upvotes: 1
Views: 1070
Reputation: 66819
Assuming there is at most one row per keyCol
value, I'd do...
# setup
kc = "keyCol"
DTs = list(left, right)
# make main table with key col(s)
DT = unique(rbindlist(lapply(DTs, `[`, j = ..kc)))
# get non-key cols
for (d in DTs){
cols = setdiff(names(d), kc)
DT[d, on=kc, (cols) := mget(sprintf("i.%s", cols)) ][]
}
# cleanup loop vars
rm(d, cols)
This should work for more general cases with...
kc
) and DTs
).If you want the key cols as the key in the result, the code simplifies a little:
# make main table with key col(s)
DT = setkey(unique(rbindlist(lapply(DTs, `[`, j = ..kc))))
# get non-key cols
for (d in DTs){
cols = setdiff(names(d), kc)
DT[d, (cols) := mget(sprintf("i.%s", cols)) ][]
}
Upvotes: 1