OKLM
OKLM

Reputation: 95

Combining frames with overlapping data in R

I'm working with a large frame that is comprised of many smaller frames that are constantly being updated. There is a row that had a unique identifier for all the values, however some values are NA since it is constantly being updated. Whenever I try merge the old frame and the new frame together, I get duplicate columns (ex. Age.x and Age.y) since I'm merging by the Id and nothing else.

Here is some data similar to what I am working with and what I am trying to get.

Frame1 <- data.frame(cbind("Id"=c("PS1001","PS1002","PS1003"), "Name"= c("Greg", "Paul", "Linda"), "Age"=c(14,NA, 43)))

Frame2 <- data.frame(cbind("Id"=c("PS1001","PS1002","PS1003", "PS1004"), "Name"= c(NA, "Paul", NA, "John"), "Age"=c(NA,51, 43, 31), "Housesize"= c(3,4,2,1)))

DesiredFrame <- data.frame(cbind("Id"=c("PS1001","PS1002","PS1003", "PS1004"), "Name"= c("Greg", "Paul", "Linda", "John"), "Age"=c(14,51, 43, 31), "Housesize"= c(3,4,2,1)))

I have code that I will show below for fixing this issue, as I have run into it on some of my smaller frames, but I have about 80 variables I need to do this on for my full frame, so writing this out 80+ times is not feasible. (I also use the data.table library and have to convert the frames to data tables for this to work)

library(data.table)
Table1 <- setDT(Frame1)
Table2 <- setDT(Frame2)
CombinedTable <- merge(Table1, Table2, by="Id", all.y=TRUE)
CombinedTable$Age.x <- as.character(CombinedTable$Age.x)
CombinedTable$Age.y <- as.character(CombinedTable$Age.y)
CombinedTable[, Age := ifelse(!is.na(Age.x), Age.x, Age.y)]
CombinedTable[, `:=` (Age.x=NULL, Age.y=NULL)] 

Does anyone know of a simpler way to fix this issue? Whether that be a smoother merge or a way I could apply the singular fix I have listed above to all the variables in the frame.

Upvotes: 2

Views: 41

Answers (1)

akrun
akrun

Reputation: 887501

We can use fcoalesce after joining on 'Id'

library(data.table)
nm1 <- c('Name', 'Age')
setDT(Frame2)[Frame1, (nm1) := Map(fcoalesce, 
       mget(nm1), mget(paste0('i.', nm1))), on =  .(Id)]

Upvotes: 1

Related Questions