Reputation: 599
This is a data frame.
df <- tribble(
~ID, ~AKT1, ~AKT3, ~BRCA1, ~BRCA2,
1800018, FALSE, TRUE, FALSE, FALSE,
1800021, FALSE, FALSE, FALSE, FALSE,
1800026, FALSE, FALSE, FALSE, TRUE,
1800027, FALSE, FALSE, FALSE, FALSE
)
Here is the description and category of the colnames.
gene_to_pathway <- tribble(
~Gene, ~Pathway,
"AKT1", "PI3K",
"AKT3", "PI3K",
"BRCA1", "Genome Integrity",
"BRCA2", "Genome Integrity"
)
What I want is to merge the columns which belong to the same pathway
as stated in gene2pathway
above.
What I want to get is a final data frame like this.
> df2
PI3K Genome Integrity
1800018 TRUE FALSE
1800021 FALSE FALSE
1800026 FALSE TRUE
1800027 FALSE FALSE
Any help is appreciated!
Upvotes: 2
Views: 99
Reputation: 26218
library(dplyr)
library(tidyr)
df %>% pivot_longer(-ID) %>% left_join(gene2pathaway, by = c("name" = "Gene")) %>%
group_by(ID, Pathway) %>%
summarise(value = as.logical(sum(value))) %>%
pivot_wider(id_cols = ID, names_from = Pathway, values_from = value)
ID `Genome Integrity` PI3K
<int> <lgl> <lgl>
1 1800018 FALSE TRUE
2 1800021 FALSE FALSE
3 1800026 TRUE FALSE
4 1800027 FALSE FALSE
Or you can write the function in pivot_wider
directly thus saving two steps
df %>% pivot_longer(-ID) %>%
left_join(gene2pathaway, by = c("name" = "Gene")) %>%
pivot_wider(id_cols = ID, names_from = Pathway, values_from = value, values_fn = function(x) as.logical(sum(x))) %>%
column_to_rownames("ID")
PI3K Genome Integrity
1800018 TRUE FALSE
1800021 FALSE FALSE
1800026 FALSE TRUE
1800027 FALSE FALSE
dputs used
df <- read.table(text = " ID AKT1 AKT3 BRCA1 BRCA2
1 1800018 FALSE TRUE FALSE FALSE
2 1800021 FALSE FALSE FALSE FALSE
3 1800026 FALSE FALSE FALSE TRUE
4 1800027 FALSE FALSE FALSE FALSE", header = T)
gene2pathaway <- read.table(text = " Gene Pathway
1 AKT1 PI3K
2 AKT3 PI3K
3 BRCA1 'Genome Integrity'
4 BRCA2 'Genome Integrity'", header = T)
Upvotes: 4