Marius
Marius

Reputation: 35

Subsetting data frame by single words occuring in column name

I'm new to stackoverflow and R in general so I hope I don't violate any etiquette :)

So I have quite a big data frame of gene expression levels called expression and I would like to define subsets based on words that occur in the column names.

gene.adk1 gene.adk2 gene.adk3 gene.bas1 gene.bas2   etc
1         2         1         4         6

This is just a small example version of the data frame. What I want to do is define one subset only containing the columns that have "adk" in their title and another subset of the columns containing "bas" in their title

What I did was to sort the column names alphabetically and look at my data frame to find out how many columns there are containing "adk" in their title. I then defined the subset by using the subset function:

adk <- subset.data.frame(expression, select = c(1:3))

Is there a more elegant way of doing this? maybe defining subsets by single words like "adk" in the column name?

Thanks in advance

Marius

Upvotes: 1

Views: 625

Answers (2)

akrun
akrun

Reputation: 887891

We can either use grep to match substring 'adk', 'bas' in the column names to select those columns

adkexprs <- expression[grep('adk', names(expression))]
basexprs <- expression[grep('bas', names(expression))]

Also, to make this more exact match

adkexprs <- expression[grep('^gene\\.adk\\d+$', names(expression))]
basexprs <- expression[grep('^gene\\.bas\\d+$', names(expression))]

grep returns the numeric index, while grepl returns logical vector. That is the only difference

adkexprs <- expression[grepl('adk', names(expression))]
basexprs <- expression[grepl('bas', names(expression))]

Or with select from dplyr

library(dplyr)
adkexprs <- expression %>%
      select(matches('adk'))

basexprs <- expression %>%
      select(matches('bas'))

Upvotes: 1

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21440

Subset adk:

adk <- expression[grepl("\\.adk", names(expression)]

Subset bas:

bas <- expression[grepl("\\.bas", names(expression)]

Upvotes: 2

Related Questions