Jacob
Jacob

Reputation: 329

R Overwrite column values with non NA values from column in separate dataframe

I have a dataframe 'df1' with a lot of columns, but the ones of interest are:

Number Code
1
2
3
10
11 AMRO
4
277
2100 BLPH

And I have another dataframe 'df2' with a lot of columns, but the ones of interest are:

Number Code
1 AMCR
2 AMCR
3 BANO
10 BAEA
12 AMRO
4 NA
277 NA
2100 NA

I want matching values in the 'Number' columns of 'df1' and 'df2' to lead to values in the 'Code' column in 'df2' to overwrite the 'Code' values in 'df1' as long as the 'Code' values in 'df2' don't contain an NA, so that the final result of 'df1' looks like:

Number Code
1 AMCR
2 AMCR
3 BANO
10 BAEA
11 AMRO
4
277
2100 BLPH

Thank you for your help!

Upvotes: 1

Views: 172

Answers (3)

Ruam Pimentel
Ruam Pimentel

Reputation: 1329

Here is a solution playing with dplyr full_join and inner_join

library(dplyr)
df1 %>% 
  full_join(df2) %>% na.omit() %>% 
  full_join(df1 %>% inner_join(df2)) %>% 
  filter(Number %in% df1$Number) %>%
  arrange(Number)

Output


#>   Number Code
#> 1      1 AMCR
#> 2      2 AMCR
#> 3      3 BANO
#> 4      4 <NA>
#> 5     10 BAEA
#> 6     11 AMRO
#> 7    277 <NA>
#> 8   2100 BLPH

Upvotes: 1

akrun
akrun

Reputation: 887901

We can do

library(powerjoin)
power_left_join(df1, df2, by = "Number", conflict = coalesce)

-output

Number Code
1      1 AMCR
2      2 AMCR
3      3 BANO
4     10 BAEA
5     11 AMRO
6      4 <NA>
7    277 <NA>
8   2100 BLPH

Or to do an overwrite, use data.table

library(data.table)
setDT(df1)[df2, Code := fcoalesce(Code, i.Code), on = .(Number)]

-output

> df1
   Number   Code
    <int> <char>
1:      1   AMCR
2:      2   AMCR
3:      3   BANO
4:     10   BAEA
5:     11   AMRO
6:      4   <NA>
7:    277   <NA>
8:   2100   BLPH

data

df1 <- structure(list(Number = c(1L, 2L, 3L, 10L, 11L, 4L, 277L, 2100L
), Code = c(NA, NA, NA, NA, "AMRO", NA, NA, "BLPH")), 
class = "data.frame", row.names = c(NA, 
-8L))

df2 <- structure(list(Number = c(1L, 2L, 3L, 10L, 12L, 4L, 277L, 2100L
), Code = c("AMCR", "AMCR", "BANO", "BAEA", "AMRO", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

Upvotes: 2

TarJae
TarJae

Reputation: 79246

Here is an alternative approach using bind_cols:

library(dplyr)

bind_cols(df1, df2) %>% 
  mutate(Code = coalesce(Code...2, Code...4)) %>% 
  select(Number = Number...1, Code)

 Number Code
1      1 AMCR
2      2 AMCR
3      3 BANO
4     10 BAEA
5     11 AMRO
6      4 <NA>
7    277 <NA>
8   2100 BLPH

Upvotes: 1

Related Questions