R: Merge Data While Retaining Values for One Dataset in Duplicates

Question

I have two data sets, data1 and data2:

data1 <- data.frame(ID = 1:6,
                    A = c("a1", "a2", NA, "a4", "a5", NA),
                    B = c("b1", "b2", "b3", NA, "b5", NA),
                    stringsAsFactors = FALSE)
data1

ID  A   B
1   a1  b1      
2   a2  b2      
3   NA  b3      
4   a4  NA      
5   a5  b5
6   NA  NA

and

data2 <- data.frame(ID = 1:6,
                    A = c(NA, "a2", "a3", NA, "a5", "a6"),
                    B = c(NA, "b2.wrong", NA, "b4", "b5", "b6"),
                    stringsAsFactors = FALSE)
data2

ID  A   B
1   NA  NA
2   a2  b2.wrong        
3   a3  NA      
4   NA  b4      
5   a5  b5
6   a6  b6

I would like to merge them by ID so that the resultant merged dataset, data.merged, populates fields form both datasets, but chooses values from data1 whenever there are possible values from both datasets.

I.e., I would like the final dataset, data.merge, to be:

I have looked around, finding similar but not exact answers.

Ronak Shah · Accepted Answer

You can join the data and use coalesce to select the first non-NA value.

library(dplyr)

data1 %>%
  inner_join(data2, by = 'ID') %>%
  mutate(A = coalesce(A.x, A.y), 
         B = coalesce(B.x, B.y)) %>%
  select(names(data1))

#  ID  A  B
#1  1 a1 b1
#2  2 a2 b2
#3  3 a3 b3
#4  4 a4 b4
#5  5 a5 b5
#6  6 a6 b6

Or in base R comparing values with NA :

transform(merge(data1, data2, by = 'ID'), 
          A = ifelse(is.na(A.x), A.y, A.x),
          B = ifelse(is.na(B.x), B.y, B.x))[names(data1)]

R: Merge Data While Retaining Values for One Dataset in Duplicates

Answers (1)

Related Questions