Reputation: 95
I'm working with a large frame that is comprised of many smaller frames that are constantly being updated. There is a row that had a unique identifier for all the values, however some values are NA since it is constantly being updated. Whenever I try merge the old frame and the new frame together, I get duplicate columns (ex. Age.x and Age.y) since I'm merging by the Id and nothing else.
Here is some data similar to what I am working with and what I am trying to get.
Frame1 <- data.frame(cbind("Id"=c("PS1001","PS1002","PS1003"), "Name"= c("Greg", "Paul", "Linda"), "Age"=c(14,NA, 43)))
Frame2 <- data.frame(cbind("Id"=c("PS1001","PS1002","PS1003", "PS1004"), "Name"= c(NA, "Paul", NA, "John"), "Age"=c(NA,51, 43, 31), "Housesize"= c(3,4,2,1)))
DesiredFrame <- data.frame(cbind("Id"=c("PS1001","PS1002","PS1003", "PS1004"), "Name"= c("Greg", "Paul", "Linda", "John"), "Age"=c(14,51, 43, 31), "Housesize"= c(3,4,2,1)))
I have code that I will show below for fixing this issue, as I have run into it on some of my smaller frames, but I have about 80 variables I need to do this on for my full frame, so writing this out 80+ times is not feasible. (I also use the data.table library and have to convert the frames to data tables for this to work)
library(data.table)
Table1 <- setDT(Frame1)
Table2 <- setDT(Frame2)
CombinedTable <- merge(Table1, Table2, by="Id", all.y=TRUE)
CombinedTable$Age.x <- as.character(CombinedTable$Age.x)
CombinedTable$Age.y <- as.character(CombinedTable$Age.y)
CombinedTable[, Age := ifelse(!is.na(Age.x), Age.x, Age.y)]
CombinedTable[, `:=` (Age.x=NULL, Age.y=NULL)]
Does anyone know of a simpler way to fix this issue? Whether that be a smoother merge or a way I could apply the singular fix I have listed above to all the variables in the frame.
Upvotes: 2
Views: 41
Reputation: 887501
We can use fcoalesce
after joining on
'Id'
library(data.table)
nm1 <- c('Name', 'Age')
setDT(Frame2)[Frame1, (nm1) := Map(fcoalesce,
mget(nm1), mget(paste0('i.', nm1))), on = .(Id)]
Upvotes: 1