Join two data frames with partially common and unequal columns

Question

Is there a better way to leverage the power of merge and join in R? Merge looses unique rows and join creates duplicate and partially filled columns.

Dataframe1
Key  Col1  Col2  Col3
A    1     2     3
B    2     4     6

Dataframe2
Key  Col1  Col2  Col4
A    1     2     4
C    3     6     12
D    4     8     20

Merged Dataframe
Key  Col1  Col2  Col3  Col4
A    1     2     3     4
B    2     4     6      
C    3     6       12
D    4     8       20

akrun · Accepted Answer

We could bind the datasets with bind_rows and then do a group by summarise or reframe to return only the non-NA rows

library(dplyr)
bind_rows(df1, df2) %>% 
  group_by(Key) %>%
  reframe(across(everything(), ~ .x[!is.na(.x)][1]))

-output

# A tibble: 4 × 5
  Key    Col1  Col2  Col3  Col4
      
1 A         1     2     3     4
2 B         2     4     6    NA
3 C         3     6    NA    12
4 D         4     8    NA    20

Or may use powerjoin

library(powerjoin)
power_full_join(df1, df2, by = "Key", conflict = coalesce_xy) %>% 
    select(Key, order(names(.)[-1])+1)

-output

   Key Col1 Col2 Col3 Col4
1   A    1    2    3    4
2   B    2    4    6   NA
3   C    3    6   NA   12
4   D    4    8   NA   20

data

df1 <- structure(list(Key = c("A", "B"), Col1 = 1:2, Col2 = c(2L, 4L
), Col3 = c(3L, 6L)), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(Key = c("A", "C", "D"), Col1 = c(1L, 3L, 4L), 
    Col2 = c(2L, 6L, 8L), Col4 = c(4L, 12L, 20L)),
 class = "data.frame", row.names = c(NA, 
-3L))

Join two data frames with partially common and unequal columns

Answers (2)

data

Related Questions