Reputation: 365

All the rows associated with a column in a data.frame

I am working in R with a data.frame that contains biological information : in a first column we describe a biological pathway (let's call it X) , and in the second column we have a gene Y associated with that pathway.

> head(x)
                 pathway_vec     V2
1 KEGG_N_GLYCAN_BIOSYNTHESIS  ALG13
2 KEGG_N_GLYCAN_BIOSYNTHESIS DOLPP1
3 KEGG_N_GLYCAN_BIOSYNTHESIS   RPN1
4 KEGG_N_GLYCAN_BIOSYNTHESIS  ALG14
5 KEGG_N_GLYCAN_BIOSYNTHESIS MAN1B1
6 KEGG_N_GLYCAN_BIOSYNTHESIS   ALG3

> tail(x)
                 pathway_vec       V2
12792 KEGG_VIRAL_MYOCARDITIS     MYH8
12793 KEGG_VIRAL_MYOCARDITIS    MYH11
12794 KEGG_VIRAL_MYOCARDITIS      FYN
12795 KEGG_VIRAL_MYOCARDITIS    MYH10
12796 KEGG_VIRAL_MYOCARDITIS HLA-DRB1
12797 KEGG_VIRAL_MYOCARDITIS  HLA-DRA

Question: how can I find all the pathways associated with a gene ? in other words, given a value in a column (Y), how can I find all the rows (X) associated with the columns that contain the genes ?

I have meant also to ask please : after I obtain a dataframe such as :

f1 <- function(dat, str1) {
         filter(dat, V2 == str1)

    > f1(x, "MYH8")
                 pathway_vec   V2
    1    KEGG_TIGHT_JUNCTION MYH8
    2 KEGG_VIRAL_MYOCARDITIS MYH8

how can I place on the same line "KEGG_TIGHT_JUNCTION" and "KEGG_VIRAL_MYOCARDITIS" ? Thanks!

Upvotes: 1

Answers (3)

akrun

Reputation: 887981

We can do

library(dplyr)
x %>%
    group_by(V2) %>%
    summarise(pathway_vec = list(unique(pathway_vec)))

Or if we want to just return a single value, use a function

f1 <- function(dat, str1) {
         filter(dat, V2 == str1)
}
f1(x, "MYH8")

Upvotes: 2

TarJae

Reputation: 79311

We could use filter or subset by gene:

ALG13 <- subset(df, V2 == "ALG13")

filter(df, V2 == "ALG13"))

Upvotes: 2

Martin Wettstein

Reputation: 2904

You can select all lines, for which the condition is met, using this syntax:

x[V2==gene,]

where gene is the character you want to filter your data by.

So, for example, if you want all pathways for gene "MYH8", just write:

x.myh8 = x[V2=="MYH8"]

The result is a filtered data. frame.

Upvotes: 1

All the rows associated with a column in a data.frame

Answers (3)

Related Questions