Reputation: 1572
I'm struggling to merge two data.frame with na values occuring in one or the other df.
sampleA <- structure(list(Nom_xp = "A1MRJ", Rep = 1L, GB05 = 102L, GB05.1 = 102L,
GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L,
GB27.1 = 169L, GB24 = 240L, GB24.1 = 242L, GB28 = NA_integer_,
GB28.1 = NA_integer_, GB15 = 142L, GB15.1 = 144L, GB02 = 197L,
GB02.1 = 197L, GB10 = 126L, GB10.1 = 134L, GB14 = 181L, GB14.1 = 193L), .Names = c("Nom_xp",
"Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1",
"GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15",
"GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"
), row.names = 32L, class = "data.frame")
sampleB <- structure(list(Nom_xp = "A1MRJ", Rep = 2L, GB05 = NA, GB05.1 = NA,
GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L,
GB27.1 = 169L, GB24 = 240L, GB24.1 = 242L, GB28 = 390L, GB28.1 = 390L,
GB15 = 142L, GB15.1 = 144L, GB02 = 197L, GB02.1 = 197L, GB10 = 126L,
GB10.1 = 134L, GB14 = 181L, GB14.1 = 193L), .Names = c("Nom_xp",
"Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1",
"GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15",
"GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"
), row.names = 33L, class = "data.frame")
Output needed, as a data.frame. Only one line every for matching "Nom_xp", so the NA get's replaced by the values in either A or B, if the value exist in one or the other DF.
Nom_xp GB05 GB05 GB18 GB18 GB06 GB06 GB27 GB27 GB24 GB24 GB28 GB28 GB15 GB15 GB02 GB02 GB10 GB10 GB14 GB14
A1MRJ 102 102 177 177 240 240 169 169 240 242 390 390 142 144 197 197 126 134 181 193
I would've thought that :
output <- merge(A,B,by="Nom_xp",all.x=T,all.y=T)
or
output <- join(A,B,by="Nom_xp",match="all")
would give me what I need, but no luck so far... What am I missing ? Actual data.frame has more than one row.
Upvotes: 2
Views: 1184
Reputation: 1373
Not entierly sure on how your whole data set looks like but I assume you could have several samples with the same "Nom_xp" and not only 2? And that you probably have all your data in a big dataframe or such?
If so, maybe this code could be a good start (maybe someone can help out and re-write this much much more efficient?). Anyhow:
sampleA <- structure(list(Nom_xp = "A1MRJ", Rep = 1L, GB05 = 102L, GB05.1 = 102L,
GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L,
GB27.1 = 169L, GB24 = 240L, GB24.1 = 242L, GB28 = NA_integer_,
GB28.1 = NA_integer_, GB15 = 142L, GB15.1 = 144L, GB02 = 197L,
GB02.1 = 197L, GB10 = 126L, GB10.1 = 134L, GB14 = 181L, GB14.1 = 193L), .Names = c("Nom_xp", "Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1","GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15","GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"), row.names = 32L, class = "data.frame")
sampleB <- structure(list(Nom_xp = "A1MRJ", Rep = 2L, GB05 = NA, GB05.1 = NA,
GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L,
GB27.1 = 169L, GB24 = 240L, GB24.1 = 242L, GB28 = 390L, GB28.1 = 390L,
GB15 = 142L, GB15.1 = 144L, GB02 = 197L, GB02.1 = 197L, GB10 = 126L,
GB10.1 = 134L, GB14 = 181L, GB14.1 = 193L), .Names = c("Nom_xp","Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1", "GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15", "GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1" ), row.names = 33L, class = "data.frame")
sampleC <- structure(list(Nom_xp = "ASDF", Rep = 2L, GB05 = NA, GB05.1 = NA,
GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 12349L,
GB27.1 = 3, GB24 = 234112, GB24.1 = 242L, GB28 = 234, GB28.1 = 390L,
GB15 = NA, GB15.1 = 144L, GB02 = 197L, GB02.1 = 197L, GB10 = 126L,
GB10.1 = 134L, GB14 = NA, GB14.1 = 193L), .Names = c("Nom_xp", "Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1", "GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15", "GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"), row.names = 34L, class = "data.frame")
sampleD <- structure(list(Nom_xp = "ASDF", Rep = 2L, GB05 = 214, GB05.1 = 34,
GB18 = 177L, GB18.1 = 177L, GB06 = 240L, GB06.1 = 240L, GB27 = 169L,
GB27.1 = 3, GB24 = NA, GB24.1 = 242L, GB28 = 234, GB28.1 = 390L,
GB15 = 56, GB15.1 = 144L, GB02 = 197L, GB02.1 = 197L, GB10 = 15466L,
GB10.1 = 134L, GB14 = 34, GB14.1 = 193L), .Names = c("Nom_xp", "Rep", "GB05", "GB05.1", "GB18", "GB18.1", "GB06", "GB06.1", "GB27", "GB27.1", "GB24", "GB24.1", "GB28", "GB28.1", "GB15", "GB15.1", "GB02", "GB02.1", "GB10", "GB10.1", "GB14", "GB14.1"), row.names = 35L, class = "data.frame")
cdat<-rbind(sampleA,sampleB,sampleC,sampleD) #simulating your data set (?)
dcols<-dim(cdat)[2]
mat<-matrix(nrow=length(unique(cdat$Nom_xp)),ncol=dcols)
colnames(mat)<-colnames(cdat)
for (j in 1:length(unique(cdat$Nom_xp)))
{
g<-grep(unique(cdat$Nom_xp)[j],cdat$Nom_xp) #Get the Nom_xp rows that match
mat[j,1]<-cdat[g[1],1] #Fill in the "Nom_xp"
mat[j,2]<-paste(g,collapse=" ") #Fill in the "rep"
mat[j,3:dcols]<-apply(cdat[g,3:dcols],2, #Calculate a mean for each column
function(x){as.numeric(mean(x,na.rm=T))})
}
Upvotes: 0
Reputation: 118879
Do you have just one row? Then, wouldn't this be sufficient? You can get the result in sampleB
as:
sampleB[, is.na(sampleB)] <- sampleA[, is.na(sampleB)]
No, apply, join and merge are not necessary here, I think. Not tested, but this would work.
sampleB[is.na(sampleB)] <- sampleA[is.na(sampleB)]
Upvotes: 1