Reputation: 952
Is there a better way to leverage the power of merge and join in R? Merge looses unique rows and join creates duplicate and partially filled columns.
Dataframe1
Key Col1 Col2 Col3
A 1 2 3
B 2 4 6
Dataframe2
Key Col1 Col2 Col4
A 1 2 4
C 3 6 12
D 4 8 20
Merged Dataframe
Key Col1 Col2 Col3 Col4
A 1 2 3 4
B 2 4 6 <NA>
C 3 6 <NA> 12
D 4 8 <NA> 20
Upvotes: 2
Views: 75
Reputation: 887223
We could bind the datasets with bind_rows
and then do a group by summarise or reframe
to return only the non-NA rows
library(dplyr)
bind_rows(df1, df2) %>%
group_by(Key) %>%
reframe(across(everything(), ~ .x[!is.na(.x)][1]))
-output
# A tibble: 4 × 5
Key Col1 Col2 Col3 Col4
<chr> <int> <int> <int> <int>
1 A 1 2 3 4
2 B 2 4 6 NA
3 C 3 6 NA 12
4 D 4 8 NA 20
Or may use powerjoin
library(powerjoin)
power_full_join(df1, df2, by = "Key", conflict = coalesce_xy) %>%
select(Key, order(names(.)[-1])+1)
-output
Key Col1 Col2 Col3 Col4
1 A 1 2 3 4
2 B 2 4 6 NA
3 C 3 6 NA 12
4 D 4 8 NA 20
df1 <- structure(list(Key = c("A", "B"), Col1 = 1:2, Col2 = c(2L, 4L
), Col3 = c(3L, 6L)), class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(Key = c("A", "C", "D"), Col1 = c(1L, 3L, 4L),
Col2 = c(2L, 6L, 8L), Col4 = c(4L, 12L, 20L)),
class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 2
Reputation: 78937
Here is an option with full_join
combined with coalesce
:
library(dplyr)
full_join(df1, df2, by="Key") %>%
mutate(Col1 = coalesce(Col1.x, Col1.y),
Col2 = coalesce(Col2.x, Col2.y), .before="Col3") %>%
select(-contains("."))
Key Col1 Col2 Col3 Col4
1 A 1 2 3 4
2 B 2 4 6 NA
3 C 3 6 NA 12
4 D 4 8 NA 20
Upvotes: 2