Reputation: 974
I want to select the rows of data1 if the value of the first column exists in data2:
data1<-data.table(a=c(1,2,3,1),b=c(1,4,6,3))
data2<-data.table(a=c(1,3,1,5,6))
Desired output:
a b
1 1
3 6
1 3
The merge does not work since the keys are not unique, any other idea?
Upvotes: 3
Views: 2045
Reputation: 18612
Here's an approach without joins:
> unique(data1[a %in% data2[,a],])
a b
1: 1 1
2: 3 6
3: 1 3
Upvotes: 3
Reputation: 887851
We can create a sequence column ('ind') in each of the datasets for the 'a' column and 'join' after setting the key column as 'a' and 'ind'
data1[, ind:=1:.N, a]
data2[, ind:= 1:.N, a]
setkey(data1, a, ind)[data2, nomatch=0][, ind:=NULL]
# a b
#1: 1 1
#2: 3 6
#3: 1 3
Upvotes: 4