nhern121
nhern121

Reputation: 3921

Particular join in R data frame

Say I have two data frames:

df1<- data.frame(id1=c('A','B','C','D','P'), 
                 id2=c('P','H','Q','S','A'),weight=c(3,4,2,7,3), stringsAsFactors=FALSE)

df2<- data.frame(id1=c('A','H','Q','D','P'), 
                 id2=c('P','B','C','S','Z'),var=c(2,1,2,2,1), stringsAsFactors=FALSE)

I want to join these two data frames by id1 and id2 but sometimes the records are switched in both tables. For instance, the second and third record of each frame should be the same and the output in the merged table should be:

B H 4 1
C Q 2 2

I thought about sorting first the columns and do the merge but this approach does not work because not all the records appear in both tables (even after sorting, you can have id1 and id2 switched). This is a toy example, but in the actual application id1 and id2 are long strings.

What's a way to tackle this task?

Upvotes: 0

Views: 60

Answers (1)

agstudy
agstudy

Reputation: 121568

Here a solution by creating an intermediate colunm that combine both id's in a sorted way.

df1$key <- with(df1,mapply(function(x,y){
  paste(sort(c(x,y)),collapse="")
  },id1,id2))

df2$key <- with(df2,mapply(function(x,y){
  paste(sort(c(x,y)),collapse="")
},id1,id2))

merge(df1,df2,by="key")

#   key id1.x id2.x weight id1.y id2.y var
# 1  AP     A     P      3     A     P   2
# 2  AP     P     A      3     A     P   2
# 3  BH     B     H      4     H     B   1
# 4  CQ     C     Q      2     Q     C   2
# 5  DS     D     S      7     D     S   2

Upvotes: 2

Related Questions