Reputation: 3921
Say I have two data frames:
df1<- data.frame(id1=c('A','B','C','D','P'),
id2=c('P','H','Q','S','A'),weight=c(3,4,2,7,3), stringsAsFactors=FALSE)
df2<- data.frame(id1=c('A','H','Q','D','P'),
id2=c('P','B','C','S','Z'),var=c(2,1,2,2,1), stringsAsFactors=FALSE)
I want to join these two data frames by id1
and id2
but sometimes the records are switched in both tables. For instance, the second and third record of each frame should be the same and the output in the merged table should be:
B H 4 1
C Q 2 2
I thought about sorting first the columns and do the merge but this approach does not work because not all the records appear in both tables (even after sorting, you can have id1
and id2
switched). This is a toy example, but in the actual application id1
and id2
are long strings.
What's a way to tackle this task?
Upvotes: 0
Views: 60
Reputation: 121568
Here a solution by creating an intermediate colunm that combine both id's in a sorted way.
df1$key <- with(df1,mapply(function(x,y){
paste(sort(c(x,y)),collapse="")
},id1,id2))
df2$key <- with(df2,mapply(function(x,y){
paste(sort(c(x,y)),collapse="")
},id1,id2))
merge(df1,df2,by="key")
# key id1.x id2.x weight id1.y id2.y var
# 1 AP A P 3 A P 2
# 2 AP P A 3 A P 2
# 3 BH B H 4 H B 1
# 4 CQ C Q 2 Q C 2
# 5 DS D S 7 D S 2
Upvotes: 2