Reputation: 13
I am trying to merge two data frames by two columns where the values for gene1 or gene2 could be in either column
df1<-data.frame(gene_1=c('A','B','E'),
gene_2=c('B','C','C'),
value =c(0,1,1))
df2<-data.frame(gene_1=c('B','B','D'),
gene_2=c('A','C','E'),
value=c(1,0.5,1))
and the desired result is:
result<-data.frame(gene_1=c('A','B','D','E'),
gene_2=c('B','C','E','C'),
value1=c(0, 1, NA, 1),
value2=c(1, 0.5, 1, NA))
where it would merge the first row in the two dfs gene1 =A, gene2=B and gene1=B and gene2=A since what is gene1 vs gene2 is arbitrary.
thank you
Upvotes: 1
Views: 37
Reputation: 389047
You could sort gene_1
, gene_2
columns in both the datasets and then perform a full_join
.
library(dplyr)
df1 %>%
transmute(col1 = pmin(gene_1, gene_2),
col2 = pmax(gene_1, gene_2),
value1 = value) %>%
full_join(df2 %>%
transmute(col1 = pmin(gene_1, gene_2),
col2 = pmax(gene_1, gene_2),
value2 = value),
by = c('col1', 'col2'))
# col1 col2 value1 value2
#1 A B 0 1.0
#2 B C 1 0.5
#3 C E 1 NA
#4 D E NA 1.0
Upvotes: 1