Impossible9
Impossible9

Reputation: 101

Matching data from one data frame to another

Firstly, apologies if this question isn't phrased in the best way possible, I am new to this but tried to make the question clear. I am trying to achieve the following

I have two data frames and am trying to take data from one of them and add it to a new column in the other, I have created an example of this below

IDa <- c(1,2,3)
score1a <- c(5,10,1)
score2a <- c(NA,8,NA)
score3a <- c(NA,NA,13)

dfa <- data.frame(IDa,score1a,score2a,score3a)

IDb <- c(1,1,1,2,2,3)
timeb <- c(1,2,3,2,3,3)

dfb <- data.frame(IDb,timeb)

score1 corresponds to time 1, score2 to time 2, score3 to time 3

what I want to do is match the score to the appropriate time point, for the appropriate ID, and add this as an additional column in dfb

Hence dfb will have an additional column with 5, NA, NA, 8, NA, 13

Hope that makes sense, thanks for any help with this!

edit: I should add that as you can see the time points available in dfb don't necessarily make sense, for example data is recorded for ID=2 at time point 1 in dfa but dfb has no where to put this (now row for ID=2, timeb=1), so I need to fill dfb as best as possible with the data in dfa.

Upvotes: 0

Views: 135

Answers (2)

talat
talat

Reputation: 70336

Another option would be:

require(dplyr)
require(tidyr)

gather(dfa, Score, Val, -IDa) %>% 
  mutate(Score = as.numeric(gsub("[a-zA-Z]","",  Score))) %>% 
  left_join(dfb, ., by = c("IDb" = "IDa", "timeb" = "Score"))

#  IDb timeb Val
#1   1     1   5
#2   1     2  NA
#3   1     3  NA
#4   2     2   8
#5   2     3  NA
#6   3     3  13

The steps are similar to akrun's answer but using different functions.

Upvotes: 2

akrun
akrun

Reputation: 887821

You can melt the dfa to long form and then merge with dfb after converting the variable column to match the timeb.

library(reshape2)
merge(dfb,transform(melt(dfa, id.var='IDa', na.rm=TRUE),
             variable=as.numeric(factor(variable))),
               by.x=c('IDb', 'timeb'), by.y=c('IDa', 'variable'), all.x=TRUE)
#    IDb timeb value
#1   1     1     5
#2   1     2    NA
#3   1     3    NA
#4   2     2     8
#5   2     3    NA
#6   3     3    13

Or change the column names to and then do the merge

colnames(dfa)[-1] <- 1:3
merge(dfb, melt(dfa, id.var='IDa'),
        by.x=c('IDb', 'timeb'), by.y=c('IDa', 'variable'))

Upvotes: 2

Related Questions