Stevestingray
Stevestingray

Reputation: 453

Combining two dataframes with same columns in R

I have a form that is filled in by multiple researchers. I now have (say) two forms that i want to join.

Form one:

site   value1   Value2 
1      A          B
2      B          C 
3      NA         NA
4      NA         NA
5      A          NA

Form two

site   value1   value2 
1      NA         NA
2      NA         NA  
3      A          A
4      B          B
5      NA         NA

expected output:

site   value1   value2 
1      A          B
2      B          B 
3      A          A
4      B          B
5      A          NA

I tried using the dplyr package to try and join, but this does almost the same as dplyr::bind_cols.

I prefer solving this within the tidyverse. Help much appreciated!

Regards

Upvotes: 1

Views: 1571

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47300

We can use {powerjoin}

library(powerjoin)
power_left_join(df1, df2, by = "site", conflict = coalesce_xy)
#>   site value1 value2
#> 1    1      A      B
#> 2    2      B      C
#> 3    3      A      A
#> 4    4      B      B
#> 5    5      A   <NA>

Upvotes: 1

akrun
akrun

Reputation: 886968

If the missing values are NA, then coalesce would be easier. Here, we assume that the 'site' values are the same and are in the same order in both datasets

library(dplyr)
df1 %>%
  mutate(across(starts_with('value'), ~ 
               coalesce(., df2[[cur_column()]])))

-output

#   site value1 value2
#1    1      A      B
#2    2      B      C
#3    3      A      A
#4    4      B      B
#5    5      A   <NA>

If the datasets have difference in 'site', then do a left join and coalesce between the .x and .y columns. The logic is to loop across those columns that have .x suffix in their names, within the loop, retrieve the column name (cur_column()), replace the suffix with .y, get the value of that column and do the coalesce

library(dplyr)
df1 %>%
    left_join(df2, by = 'site') %>%
    mutate(across(ends_with('.x'), ~ 
               coalesce(., get(str_replace(cur_column(), '\\.x$', '\\.y'))))) %>%
    select(-ends_with('.y')) %>%
    rename_with(~ str_remove(., '\\.x'), ends_with('.x'))

#   site value1 value2
#1    1      A      B
#2    2      B      C
#3    3      A      A
#4    4      B      B
#5    5      A   <NA>

data

df1 <- structure(list(site = 1:5, value1 = c("A", "B", NA, NA, "A"), 
    value2 = c("B", "C", NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))

df2 <- structure(list(site = 1:5, value1 = c(NA, NA, "A", "B", NA), 
    value2 = c(NA, NA, "A", "B", NA)), class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 4

Related Questions