Reputation: 329
I have a dataframe 'df1' with a lot of columns, but the ones of interest are:
Number | Code |
---|---|
1 | |
2 | |
3 | |
10 | |
11 | AMRO |
4 | |
277 | |
2100 | BLPH |
And I have another dataframe 'df2' with a lot of columns, but the ones of interest are:
Number | Code |
---|---|
1 | AMCR |
2 | AMCR |
3 | BANO |
10 | BAEA |
12 | AMRO |
4 | NA |
277 | NA |
2100 | NA |
I want matching values in the 'Number' columns of 'df1' and 'df2' to lead to values in the 'Code' column in 'df2' to overwrite the 'Code' values in 'df1' as long as the 'Code' values in 'df2' don't contain an NA, so that the final result of 'df1' looks like:
Number | Code |
---|---|
1 | AMCR |
2 | AMCR |
3 | BANO |
10 | BAEA |
11 | AMRO |
4 | |
277 | |
2100 | BLPH |
Thank you for your help!
Upvotes: 1
Views: 172
Reputation: 1329
Here is a solution playing with dplyr
full_join
and inner_join
library(dplyr)
df1 %>%
full_join(df2) %>% na.omit() %>%
full_join(df1 %>% inner_join(df2)) %>%
filter(Number %in% df1$Number) %>%
arrange(Number)
#> Number Code
#> 1 1 AMCR
#> 2 2 AMCR
#> 3 3 BANO
#> 4 4 <NA>
#> 5 10 BAEA
#> 6 11 AMRO
#> 7 277 <NA>
#> 8 2100 BLPH
Upvotes: 1
Reputation: 887901
We can do
library(powerjoin)
power_left_join(df1, df2, by = "Number", conflict = coalesce)
-output
Number Code
1 1 AMCR
2 2 AMCR
3 3 BANO
4 10 BAEA
5 11 AMRO
6 4 <NA>
7 277 <NA>
8 2100 BLPH
Or to do an overwrite, use data.table
library(data.table)
setDT(df1)[df2, Code := fcoalesce(Code, i.Code), on = .(Number)]
-output
> df1
Number Code
<int> <char>
1: 1 AMCR
2: 2 AMCR
3: 3 BANO
4: 10 BAEA
5: 11 AMRO
6: 4 <NA>
7: 277 <NA>
8: 2100 BLPH
df1 <- structure(list(Number = c(1L, 2L, 3L, 10L, 11L, 4L, 277L, 2100L
), Code = c(NA, NA, NA, NA, "AMRO", NA, NA, "BLPH")),
class = "data.frame", row.names = c(NA,
-8L))
df2 <- structure(list(Number = c(1L, 2L, 3L, 10L, 12L, 4L, 277L, 2100L
), Code = c("AMCR", "AMCR", "BANO", "BAEA", "AMRO", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))
Upvotes: 2
Reputation: 79246
Here is an alternative approach using bind_cols
:
library(dplyr)
bind_cols(df1, df2) %>%
mutate(Code = coalesce(Code...2, Code...4)) %>%
select(Number = Number...1, Code)
Number Code
1 1 AMCR
2 2 AMCR
3 3 BANO
4 10 BAEA
5 11 AMRO
6 4 <NA>
7 277 <NA>
8 2100 BLPH
Upvotes: 1