Reputation: 365
I am working in R with a data.frame that contains biological information : in a first column we describe a biological pathway (let's call it X) , and in the second column we have a gene Y associated with that pathway.
> head(x)
pathway_vec V2
1 KEGG_N_GLYCAN_BIOSYNTHESIS ALG13
2 KEGG_N_GLYCAN_BIOSYNTHESIS DOLPP1
3 KEGG_N_GLYCAN_BIOSYNTHESIS RPN1
4 KEGG_N_GLYCAN_BIOSYNTHESIS ALG14
5 KEGG_N_GLYCAN_BIOSYNTHESIS MAN1B1
6 KEGG_N_GLYCAN_BIOSYNTHESIS ALG3
> tail(x)
pathway_vec V2
12792 KEGG_VIRAL_MYOCARDITIS MYH8
12793 KEGG_VIRAL_MYOCARDITIS MYH11
12794 KEGG_VIRAL_MYOCARDITIS FYN
12795 KEGG_VIRAL_MYOCARDITIS MYH10
12796 KEGG_VIRAL_MYOCARDITIS HLA-DRB1
12797 KEGG_VIRAL_MYOCARDITIS HLA-DRA
Question: how can I find all the pathways associated with a gene ? in other words, given a value in a column (Y), how can I find all the rows (X) associated with the columns that contain the genes ?
I have meant also to ask please : after I obtain a dataframe such as :
f1 <- function(dat, str1) {
filter(dat, V2 == str1)
> f1(x, "MYH8")
pathway_vec V2
1 KEGG_TIGHT_JUNCTION MYH8
2 KEGG_VIRAL_MYOCARDITIS MYH8
how can I place on the same line "KEGG_TIGHT_JUNCTION" and "KEGG_VIRAL_MYOCARDITIS" ? Thanks!
Upvotes: 1
Views: 49
Reputation: 887981
We can do
library(dplyr)
x %>%
group_by(V2) %>%
summarise(pathway_vec = list(unique(pathway_vec)))
Or if we want to just return a single value, use a function
f1 <- function(dat, str1) {
filter(dat, V2 == str1)
}
f1(x, "MYH8")
Upvotes: 2
Reputation: 79311
We could use filter
or subset
by gene:
ALG13 <- subset(df, V2 == "ALG13")
filter(df, V2 == "ALG13"))
Upvotes: 2
Reputation: 2904
You can select all lines, for which the condition is met, using this syntax:
x[V2==gene,]
where gene
is the character you want to filter your data by.
So, for example, if you want all pathways for gene "MYH8", just write:
x.myh8 = x[V2=="MYH8"]
The result is a filtered data. frame.
Upvotes: 1