Reputation: 10167
Given two data tables (tbl_A
and tbl_B
), I would like to select all the rows in the tbl_A
which have matching rows in tbl_B
, and I would like the code to be expressive. If the %in%
operator were defined for data.tables, something like this would be be ideal:
subset <- tbl_A[tbl_A %in% tbl_B]
I can think of many ways to accomplish what I want such as:
# double negation (set differences)
subset <- tbl_A[!tbl_A[!tbl_B,1,keyby=a]]
# nomatch with keyby and this annoying `[,V1:=NULL]` bit
subset <- tbl_B[,1,keyby=.(a=x)][,V1:=NULL][tbl_A,nomatch=0L]
# nomatch with !duplicated() and setnames()
subset <- tbl_B[!duplicated(tbl_B),.(x)][tbl_A,nomatch=0L]; setnames(subset,"x","a")
# nomatch with !unique() and setnames()
subset <- unique(tbl_B)[,.(x)][tbl_A,nomatch=0L]; setnames(subset,"x","a")
# use of a temporary variable (Thanks @Frank)
subset <- tbl_A[, found := FALSE][tbl_B, found := TRUE][(found)][,found:=NULL][]
but each expression is difficult to read and it's not obvious at first glance what the code is doing. Is there a more idiomatic / expressive way of accomplishing this task?
For purposes of example, here are some toy data.tables:
# toy tables
tbl_A <- data.table(a=letters[1:5],
b=1:5,
c=rnorm(5))
tbl_B <- data.table(x=letters[3:7],
y=13:17,
z=rnorm(5))
# both tables might have multiple rows with the same key fields.
tbl_A <- rbind(tbl_A,tbl_A)
tbl_B <- rbind(tbl_B,tbl_B)
setkey(tbl_A,a)
setkey(tbl_B,x)
and an expected result containing the rows in tbl_A
which match at least one row in tbl_B
:
a b c
1: c 3 -0.5403072
2: c 3 -0.5403072
3: d 4 -1.3353621
4: d 4 -1.3353621
5: e 5 1.1811730
6: e 5 1.1811730
Upvotes: 5
Views: 985
Reputation: 25225
Adding 2 more options
tbl_A[fintersect(tbl_A[,.(a)], tbl_B[,.(a=x)])]
and
tbl_A[unique(tbl_A[tbl_B, nomatch=0L, which=TRUE])]
Upvotes: 2
Reputation: 345
I'm not sure how expressive it is (apologies if not) but this seems to work:
tbl_A[,.(a,b,c,any(a == tbl_B[,x])), by = a][V4==TRUE,.(a,b,c)]
I'm sure it can be improved - I only found out about any() yesterday and still testing it :)
Upvotes: 0