Reputation: 135
I have two incomplete data.tables with the same column names.
dt1 <- data.table(id = c(1, 2, 3), v1 = c("w", "x", NA), v2 = c("a", NA, "c"))
dt2 <- data.table(id = c(2, 3, 4), v1 = c(NA, "y", "z"), v2 = c("b", "c", NA))
They look like this:
dt1
id v1 v2
1: 1 w a
2: 2 x <NA>
3: 3 <NA> c
> dt2
id v1 v2
1: 2 <NA> b
2: 3 y c
3: 4 z <NA>
Is there a way to merge the two by filling in the missing info?
This is the result I'm after:
id v1 v2
1: 1 w a
2: 2 x b
3: 3 y c
4: 4 z <NA>
I've tried various data.table joins, merges but I either get the columns repeated:
> merge(dt1,
+ dt2,
+ by = "id",
+ all = TRUE)
id v1.x v2.x v1.y v2.y
1: 1 w a <NA> <NA>
2: 2 x <NA> <NA> b
3: 3 <NA> c y c
4: 4 <NA> <NA> z <NA>
or the rows repeated:
> merge(dt1,
+ dt2,
+ by = names(dt1),
+ all = TRUE)
id v1 v2
1: 1 w a
2: 2 <NA> b
3: 2 x <NA>
4: 3 <NA> c
5: 3 y c
6: 4 z <NA>
Both data.tables have the same column names.
Upvotes: 3
Views: 362
Reputation: 51592
You can group by ID and get the unique values after omitting NAs, i.e.
library(data.table)
merge(dt1, dt2, all = TRUE)[,
lapply(.SD, function(i)na.omit(unique(i))),
by = id][]
# id v1 v2
#1: 1 w a
#2: 2 x b
#3: 3 y c
#4: 4 z <NA>
Upvotes: 3
Reputation: 41367
First full_join
and after that group_by
per id and merge the rows:
library(dplyr)
library(tidyr)
dt1 %>%
full_join(dt2, by = c("id", "v1", "v2")) %>%
group_by(id) %>%
fill(starts_with('v'), .direction = 'updown') %>%
slice(1) %>%
ungroup
Output:
# A tibble: 4 × 3
id v1 v2
<dbl> <chr> <chr>
1 1 w a
2 2 x b
3 3 y c
4 4 z NA
Upvotes: 0
Reputation: 33488
You could also start out with rbind():
rbind(dt1, dt2)[, lapply(.SD, \(x) unique(x[!is.na(x)])), by = id]
# id v1 v2
# <num> <char> <char>
# 1: 1 w a
# 2: 2 x b
# 3: 3 y c
# 4: 4 z <NA>
Upvotes: 3