Reputation: 1463
I can't really believe that I'm the first one who is asking this but I haven't found a real duplicate. I want to merge all occurences in a second data.table to a first one. The second one has duplicated ID's and I want to add extra rows for that, duplicating the ids. However, I do not want to add the rows that do not match id's in the first data.table. Here is a reproducible example:
Starting from this data.table:
library(data.table)
DT1 <- data.table( #x = c(1,NA),
y = c("a","b"),
z = c(333,NA))
y z
1: a 333
2: b NA
I want to add (merge) this data.table:
DT2 <- data.table(y = c("a","a","c"),
z = c(444, 555, 666))
y z
1: a 444
2: a 555
3: c 666
so that I get this data.table:
y z
1: a 333
2: a 444
3: a 555
4: b NA
without an entry for y=c
but duplicating entries for y=a
. If I merge everything, I still have the c:
DT_merged_wrong <- merge(DT1,DT2, all = TRUE, by= c("y", "z"))
if I merge by all.x=T
, it drops the duplicates, which I need, however:
DT_merged_alsowrong <- merge(DT1,DT2, all.x = TRUE, by= c("y", "z"))
the filter just produces the inner join:
DT_merged_stillwrong <- DT1[DT2, on= c("y", "z")]
I hope this is a quick fix for someone more familiar with the merge properties of data.table, I can't find the answer.
Edit: Thanks for the answer below, I can also recommend the rqdatatable
package that takes care of this and many other problems.
Upvotes: 2
Views: 624
Reputation: 33548
Looks like you can solve it with rbind()
:
rbind(DT1, DT2[y %in% DT1$y])
y z
1: a 333
2: b NA
3: a 444
4: a 555
Could be done with join syntax and some post-processing:
DT2[DT1, on = "y"
][, unique(c(z, i.z)), by = y]
Upvotes: 2