Reputation: 14370
I am trying to do something very simple with data.table
and I lost the idiomatic way to do it
library(data.table)
set.seed(1)
DT = data.table(a=sample(letters,1e5,T), b=sample(letters,1e5,T), c=rnorm(1e5))
DT2 = data.table(a=sample(letters,5,T), b=sample(letters,5,T))
DT2
a b
1: k h
2: e v
3: f n
4: m q
5: w v
I want to select the rows of DT
that match those of DT2
.
As such the number of rows after operation will always be smaller that the initial table.
I want something doing this:
> DT[paste(a,b) %chin% DT2[,paste(a,b)]]
a b c
1: m q -0.4974579
2: e v -0.1325602
3: w v -1.8081050
4: m q 0.9025120
5: w v -0.4958802
---
729: f n 0.5604650
730: f n -1.2607321
731: m q 0.5146013
732: m q -1.8329656
733: k h -0.9752011
> DT2[paste(a,b) %chin% DT[,paste(a,b)]]
a b
1: e v
2: f n
3: k h
4: m q
5: w v
>
Upvotes: 2
Views: 50
Reputation: 52637
An inner join should do:
setkey(DT, a, b)[DT2, nomatch=0]
Produces:
a b c
1: k h -1.6592442
2: k h 1.1946471
3: k h -0.8694933
4: k h 0.7789158
5: k h -1.3142607
---
729: w v -0.3516787
730: w v 0.5272145
731: w v -0.7531717
732: w v 0.3352228
733: w v 0.1182353
If you want to know which values in DT2
exist in DT
then:
unique(setkey(DT[, .(a, b)], a, b))[DT2, nomatch=0]
Upvotes: 3