Reputation: 453
I have a form that is filled in by multiple researchers. I now have (say) two forms that i want to join.
Form one:
site value1 Value2
1 A B
2 B C
3 NA NA
4 NA NA
5 A NA
Form two
site value1 value2
1 NA NA
2 NA NA
3 A A
4 B B
5 NA NA
expected output:
site value1 value2
1 A B
2 B B
3 A A
4 B B
5 A NA
I tried using the dplyr
package to try and join, but this does almost the same as dplyr::bind_cols
.
I prefer solving this within the tidyverse
. Help much appreciated!
Regards
Upvotes: 1
Views: 1571
Reputation: 47300
We can use {powerjoin}
library(powerjoin)
power_left_join(df1, df2, by = "site", conflict = coalesce_xy)
#> site value1 value2
#> 1 1 A B
#> 2 2 B C
#> 3 3 A A
#> 4 4 B B
#> 5 5 A <NA>
Upvotes: 1
Reputation: 886968
If the missing values are NA
, then coalesce
would be easier. Here, we assume that the 'site' values are the same and are in the same order in both datasets
library(dplyr)
df1 %>%
mutate(across(starts_with('value'), ~
coalesce(., df2[[cur_column()]])))
-output
# site value1 value2
#1 1 A B
#2 2 B C
#3 3 A A
#4 4 B B
#5 5 A <NA>
If the datasets have difference in 'site', then do a left
join and coalesce
between the .x
and .y
columns. The logic is to loop across
those columns that have .x
suffix in their names, within the loop, retrieve the column name (cur_column()
), replace the suffix with .y
, get
the value of that column and do the coalesce
library(dplyr)
df1 %>%
left_join(df2, by = 'site') %>%
mutate(across(ends_with('.x'), ~
coalesce(., get(str_replace(cur_column(), '\\.x$', '\\.y'))))) %>%
select(-ends_with('.y')) %>%
rename_with(~ str_remove(., '\\.x'), ends_with('.x'))
# site value1 value2
#1 1 A B
#2 2 B C
#3 3 A A
#4 4 B B
#5 5 A <NA>
df1 <- structure(list(site = 1:5, value1 = c("A", "B", NA, NA, "A"),
value2 = c("B", "C", NA, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
df2 <- structure(list(site = 1:5, value1 = c(NA, NA, "A", "B", NA),
value2 = c(NA, NA, "A", "B", NA)), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 4