Map two different data-frame with common row and combining them together

Question

This is the data-frame

A

 gene
 ENSG00000000938
 ENSG00000001617
 ENSG00000001630
 ENSG00000002330
 ENSG00000002726
 ENSG00000002822

The row which would be common across the other data frame is the third column

Expression dataframe

gene log2FoldChange
     ENSG00000000938       4.247333
     ENSG00000001617       3.381206
     ENSG00000001630       2.148626
     ENSG00000002330       2.876010
     ENSG00000002822       2.508205

> Region dataframe

gene log2FoldChange
     ENSG00000000938       8.247333
     ENSG00000001617       1.381206
     ENSG00000001630       6.148626
     ENSG00000002330       3.876010
     ENSG00000002726       1.602266

So i want to map the A with both the column expression dataframe and Region dataframe Such that I would like to to end up like this

My final data-frame should be looking something like this

Output expected

gene                log2FoldChange_Expression log2FoldChange_Region
 ENSG00000000938          4.247333                8.247333
 ENSG00000001617          3.381206                1.381206
 ENSG00000001630          2.148626                6.148626
 ENSG00000002330          2.876010                3.876010

The approach i tried is

Step 1 Inner join the dataframe A with Expression dataframe Step 2 Inner Join the datafameA with region dataframe

The trying to paste each dataframe in a single dataframe. Im not sure if this is the right approach

Wimpel · Accepted Answer

Here is a data.table approach, so make sure your datasets are in data.table format. If not, you can uncomment the setDT(...) line in the code below.

sample data

library(data.table)
library(tibble)
A <- fread("gene
 ENSG00000000938
 ENSG00000001617
 ENSG00000001630
 ENSG00000002330
 ENSG00000002726
 ENSG00000002822")
expression <- fread("gene log2FoldChange
     ENSG00000000938       4.247333
     ENSG00000001617       3.381206
     ENSG00000001630       2.148626
     ENSG00000002330       2.876010
     ENSG00000002822       2.508205")
region <- fread("gene log2FoldChange
     ENSG00000000938       8.247333
     ENSG00000001617       1.381206
     ENSG00000001630       6.148626
     ENSG00000002330       3.876010
     ENSG00000002726       1.602266")

code

#put data in a named list, 
L <- tibble::lst(A, expression, region)
#loop over the list's L indexes
L <- lapply(seq_along(L), function(x) {
  #if data is not yet in data.table format, convert it
  # setDT(L[[x]])  # <-!! uncomment if your data is not already data.table
  #set the key (gene) to join on
  setkey(L[[x]], gene)
  #if the data has more than two columns, add a suffix to the columns 2 and higher
  #based on the name if the data (derived from the list L's names)
  if (x > 1) {
    output <- setnames(L[[x]], 
             grep("^(?!gene)", names(L[[x]]), value = TRUE, perl = TRUE),
             paste0(grep("^(?!gene)", names(L[[x]]), value = TRUE, perl = TRUE), "_", names(L)[x]))
  } else {
  #else return the data as-is
    output <- L[[x]]
  }
  #return output
  return(output)
})
#step over list L, and merge them al together
#reduce join
Reduce( function(...) merge(..., all = TRUE), L )

output

#               gene log2FoldChange_expression log2FoldChange_region
# 1: ENSG00000000938                  4.247333              8.247333
# 2: ENSG00000001617                  3.381206              1.381206
# 3: ENSG00000001630                  2.148626              6.148626
# 4: ENSG00000002330                  2.876010              3.876010
# 5: ENSG00000002726                        NA              1.602266
# 6: ENSG00000002822                  2.508205                    NA

Map two different data-frame with common row and combining them together

Answers (2)

Related Questions