Reputation: 35
I'm new to stackoverflow and R in general so I hope I don't violate any etiquette :)
So I have quite a big data frame of gene expression levels called expression
and I would like to define subsets based on words that occur in the column names.
gene.adk1 gene.adk2 gene.adk3 gene.bas1 gene.bas2 etc
1 2 1 4 6
This is just a small example version of the data frame. What I want to do is define one subset only containing the columns that have "adk" in their title and another subset of the columns containing "bas" in their title
What I did was to sort the column names alphabetically and look at my data frame to find out how many columns there are containing "adk" in their title. I then defined the subset by using the subset function:
adk <- subset.data.frame(expression, select = c(1:3))
Is there a more elegant way of doing this? maybe defining subsets by single words like "adk" in the column name?
Thanks in advance
Marius
Upvotes: 1
Views: 625
Reputation: 887891
We can either use grep
to match substring 'adk', 'bas' in the column names to select those columns
adkexprs <- expression[grep('adk', names(expression))]
basexprs <- expression[grep('bas', names(expression))]
Also, to make this more exact match
adkexprs <- expression[grep('^gene\\.adk\\d+$', names(expression))]
basexprs <- expression[grep('^gene\\.bas\\d+$', names(expression))]
grep
returns the numeric index, while grepl
returns logical vector. That is the only difference
adkexprs <- expression[grepl('adk', names(expression))]
basexprs <- expression[grepl('bas', names(expression))]
Or with select
from dplyr
library(dplyr)
adkexprs <- expression %>%
select(matches('adk'))
basexprs <- expression %>%
select(matches('bas'))
Upvotes: 1
Reputation: 21440
Subset adk
:
adk <- expression[grepl("\\.adk", names(expression)]
Subset bas
:
bas <- expression[grepl("\\.bas", names(expression)]
Upvotes: 2