merging two df in R where the values may be swapped in the two columns (ie col1 maybe present in col2 in the other df)

Question

I am trying to merge two data frames by two columns where the values for gene1 or gene2 could be in either column

df1<-data.frame(gene_1=c('A','B','E'),
                gene_2=c('B','C','C'),
                value =c(0,1,1))

df2<-data.frame(gene_1=c('B','B','D'),
                gene_2=c('A','C','E'),
                value=c(1,0.5,1))

and the desired result is:

   result<-data.frame(gene_1=c('A','B','D','E'),
                      gene_2=c('B','C','E','C'),
                      value1=c(0, 1, NA, 1),
                      value2=c(1, 0.5, 1, NA))

where it would merge the first row in the two dfs gene1 =A, gene2=B and gene1=B and gene2=A since what is gene1 vs gene2 is arbitrary.

thank you

Ronak Shah · Accepted Answer

You could sort gene_1, gene_2 columns in both the datasets and then perform a full_join.

library(dplyr)

df1 %>%
  transmute(col1 = pmin(gene_1, gene_2), 
            col2 = pmax(gene_1, gene_2), 
            value1 = value) %>%
  full_join(df2 %>%
  transmute(col1 = pmin(gene_1, gene_2), 
            col2 = pmax(gene_1, gene_2), 
            value2 = value), 
  by = c('col1', 'col2'))


#  col1 col2 value1 value2
#1    A    B      0    1.0
#2    B    C      1    0.5
#3    C    E      1     NA
#4    D    E     NA    1.0

merging two df in R where the values may be swapped in the two columns (ie col1 maybe present in col2 in the other df)

Answers (1)

Related Questions