Jakob
Jakob

Reputation: 1463

merge two data.tables with all.x = T while expanding due to duplicated ID's

I can't really believe that I'm the first one who is asking this but I haven't found a real duplicate. I want to merge all occurences in a second data.table to a first one. The second one has duplicated ID's and I want to add extra rows for that, duplicating the ids. However, I do not want to add the rows that do not match id's in the first data.table. Here is a reproducible example:

Starting from this data.table:

library(data.table)
DT1 <- data.table( #x = c(1,NA),
               y = c("a","b"),
               z = c(333,NA))

   y   z
1: a 333
2: b  NA

I want to add (merge) this data.table:

DT2 <- data.table(y = c("a","a","c"),
              z = c(444, 555, 666))

   y   z
1: a 444
2: a 555
3: c 666

so that I get this data.table:

   y   z
1: a 333
2: a 444
3: a 555
4: b  NA

without an entry for y=c but duplicating entries for y=a. If I merge everything, I still have the c:

DT_merged_wrong <- merge(DT1,DT2, all = TRUE, by= c("y", "z"))

if I merge by all.x=T, it drops the duplicates, which I need, however:

DT_merged_alsowrong <- merge(DT1,DT2, all.x = TRUE, by= c("y", "z"))

the filter just produces the inner join:

DT_merged_stillwrong <- DT1[DT2, on= c("y", "z")]

I hope this is a quick fix for someone more familiar with the merge properties of data.table, I can't find the answer.

Edit: Thanks for the answer below, I can also recommend the rqdatatable package that takes care of this and many other problems.

Upvotes: 2

Views: 624

Answers (1)

s_baldur
s_baldur

Reputation: 33548

Looks like you can solve it with rbind():

rbind(DT1, DT2[y %in% DT1$y])

   y   z
1: a 333
2: b  NA
3: a 444
4: a 555

Could be done with join syntax and some post-processing:

DT2[DT1, on = "y"
    ][, unique(c(z, i.z)), by = y]

Upvotes: 2

Related Questions