Lin Caijin
Lin Caijin

Reputation: 599

How to merge two columns in a data frame in R?

This is a data frame.

df <- tribble(
 ~ID, ~AKT1,  ~AKT3, ~BRCA1, ~BRCA2,
  1800018, FALSE,  TRUE, FALSE, FALSE,
  1800021, FALSE, FALSE, FALSE, FALSE,
  1800026, FALSE, FALSE, FALSE,  TRUE,
  1800027, FALSE, FALSE, FALSE, FALSE
)

Here is the description and category of the colnames.

gene_to_pathway <- tribble(
  ~Gene,                   ~Pathway,
  "AKT1",                      "PI3K",
  "AKT3",                      "PI3K",
  "BRCA1",          "Genome Integrity",
  "BRCA2",          "Genome Integrity"
)

What I want is to merge the columns which belong to the same pathway as stated in gene2pathway above.

What I want to get is a final data frame like this.

> df2
PI3K  Genome Integrity 
1800018  TRUE  FALSE 
1800021 FALSE  FALSE 
1800026 FALSE  TRUE  
1800027 FALSE  FALSE 

Any help is appreciated!

Upvotes: 2

Views: 99

Answers (1)

AnilGoyal
AnilGoyal

Reputation: 26218

library(dplyr)
library(tidyr)

df %>% pivot_longer(-ID) %>% left_join(gene2pathaway, by = c("name" = "Gene")) %>%
  group_by(ID, Pathway) %>% 
  summarise(value = as.logical(sum(value))) %>%
  pivot_wider(id_cols = ID, names_from = Pathway, values_from = value)

       ID `Genome Integrity` PI3K 
    <int> <lgl>              <lgl>
1 1800018 FALSE              TRUE 
2 1800021 FALSE              FALSE
3 1800026 TRUE               FALSE
4 1800027 FALSE              FALSE

Or you can write the function in pivot_wider directly thus saving two steps

df %>% pivot_longer(-ID) %>% 
  left_join(gene2pathaway, by = c("name" = "Gene")) %>%
  pivot_wider(id_cols = ID, names_from = Pathway, values_from = value, values_fn = function(x) as.logical(sum(x))) %>%
  column_to_rownames("ID")

         PI3K Genome Integrity
1800018  TRUE            FALSE
1800021 FALSE            FALSE
1800026 FALSE             TRUE
1800027 FALSE            FALSE

dputs used

df <- read.table(text = " ID   AKT1  AKT3 BRCA1 BRCA2
 1  1800018 FALSE  TRUE FALSE FALSE
 2  1800021 FALSE FALSE FALSE FALSE
 3  1800026 FALSE FALSE FALSE  TRUE
 4  1800027 FALSE FALSE FALSE FALSE", header = T)

gene2pathaway <- read.table(text = "        Gene           Pathway
      1      AKT1                      PI3K
      2      AKT3                      PI3K
      3     BRCA1          'Genome Integrity'
      4     BRCA2          'Genome Integrity'", header = T)

Upvotes: 4

Related Questions