sebastian-c
sebastian-c

Reputation: 15415

How can I perform a "setdiff" merge using data.table?

Say I have two tables:

library(data.table)
set.seed(1)

tab1 <- data.table(
  let = rep(letters[1:2], each = 3),
  num = rep(1:3, 2),
  val = rnorm(6),
  key = c("let", "num")
)

tab2 <- data.table(
  let = rep(letters[1:2], each = 2),
  num = rep(1:2, 2),
  val = rnorm(4),
  key = c("let", "num")
)

Table 1:

> tab1
   let num        val
1:   a   1 -0.6264538
2:   a   2  0.1836433
3:   a   3 -0.8356286
4:   b   1  1.5952808
5:   b   2  0.3295078
6:   b   3 -0.8204684

Table 2:

> tab2
   let num
1:   a   1
2:   a   2
3:   b   1
4:   b   2

Is there a way to "merge" these tables such that I get all the results in tab1 that are not in tab2?:

   let num        val
1:   a   3 -0.8356286
2:   b   3 -0.8204684

Upvotes: 12

Views: 4071

Answers (2)

etienne
etienne

Reputation: 3678

One solution would be to do a merge and remove the rows where there are values from tab2

d<-as.data.frame(merge(tab1,tab2,all=T))
t<-is.na(d[,4])
d[t,][,-4]

 let num      val.x
3   a   3 -0.8356286
6   b   3 -0.8204684

Using data.table :

merge(tab1,tab2,all=T)[is.na(val.y), 1:3]

 let num      val.x
1:   a   3 -0.8356286
2:   b   3 -0.8204684

Upvotes: 0

Arun
Arun

Reputation: 118879

In this case, it's equivalent to an anti join:

tab1[!tab2, on=c("let", "num")]

But setdiff() would only the first row for every let,num. This is marked for v1.9.8, FR #547.

Upvotes: 20

Related Questions