wes
wes

Reputation: 113

R: Merge Data While Retaining Values for One Dataset in Duplicates

I have two data sets, data1 and data2:

data1 <- data.frame(ID = 1:6,
                    A = c("a1", "a2", NA, "a4", "a5", NA),
                    B = c("b1", "b2", "b3", NA, "b5", NA),
                    stringsAsFactors = FALSE)
data1

ID  A   B
1   a1  b1      
2   a2  b2      
3   NA  b3      
4   a4  NA      
5   a5  b5
6   NA  NA

and

data2 <- data.frame(ID = 1:6,
                    A = c(NA, "a2", "a3", NA, "a5", "a6"),
                    B = c(NA, "b2.wrong", NA, "b4", "b5", "b6"),
                    stringsAsFactors = FALSE)
data2

ID  A   B
1   NA  NA
2   a2  b2.wrong        
3   a3  NA      
4   NA  b4      
5   a5  b5
6   a6  b6

I would like to merge them by ID so that the resultant merged dataset, data.merged, populates fields form both datasets, but chooses values from data1 whenever there are possible values from both datasets.

I.e., I would like the final dataset, data.merge, to be:

ID  A   B
1   a1  b1      
2   a2  b2      
3   a3  b3      
4   a4  b4      
5   a5  b5      
6   a6  b6

I have looked around, finding similar but not exact answers.

Upvotes: 1

Views: 108

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

You can join the data and use coalesce to select the first non-NA value.

library(dplyr)

data1 %>%
  inner_join(data2, by = 'ID') %>%
  mutate(A = coalesce(A.x, A.y), 
         B = coalesce(B.x, B.y)) %>%
  select(names(data1))

#  ID  A  B
#1  1 a1 b1
#2  2 a2 b2
#3  3 a3 b3
#4  4 a4 b4
#5  5 a5 b5
#6  6 a6 b6

Or in base R comparing values with NA :

transform(merge(data1, data2, by = 'ID'), 
          A = ifelse(is.na(A.x), A.y, A.x),
          B = ifelse(is.na(B.x), B.y, B.x))[names(data1)]

Upvotes: 1

Related Questions