Reputation: 77
I have been searching for quite some time to an elegant solution to this problem, to no avail. So I decided to give it a go here.
I am using tidyverse
, and the gather
function to convert a matrix containing intensity values from different samples into long format in preparation for plotting with ggplot.
There are two types of annotation. 'Row-based' annotation of the data, corresponding to genes, and 'column-based' annotation corresponding to sample information. The column based information is stored in a separate dataframe.
Using gather
it is easy to prepare the values and row-based annotations to long format.
> df <- data.frame(annot=c("A", "B", "C", "D"), sample1=c(1,1,4,2), sample2=c(3,5,4,5))
> df
annot sample1 sample2
1 A 1 3
2 B 1 5
3 C 4 4
4 D 2 5
> df %>% gather(sample, value, -annot)
annot sample value
1 A sample1 1
2 B sample1 1
3 C sample1 4
4 D sample1 2
5 A sample2 3
6 B sample2 5
7 C sample2 4
8 D sample2 5
The sample-information is more tricky. It is stored in a separate data frame:
> sample_info <- data.frame(sample=c("sample1", "sample2"), condition=c("infected", "uninfected"))
> sample_info
sample condition
1 sample1 infected
2 sample2 uninfected
The desired end result would look like the following:
annot sample value condition
1 A sample1 1 infected
2 B sample1 1 infected
3 C sample1 4 infected
4 D sample1 2 infected
5 A sample2 3 uninfected
6 B sample2 5 uninfected
7 C sample2 4 uninfected
8 D sample2 5 uninfected
I am able to achieve this by post-processing of the data frame where I map sample-name to condition row by row after generating the long data frame. I am looking for a neater solution, ideally using the tidyverse package. Do anyone know an elegant way to achieve this?
Upvotes: 1
Views: 144
Reputation: 5201
The *_join
functions from dplyr
(loaded with tidyverse
) are great for solving lots of problems involving more than one dataframe.
> df %>%
gather(sample, value, -annot) %>%
left_join(sample_info, by = 'sample')
annot sample value condition
1 A sample1 1 infected
2 B sample1 1 infected
3 C sample1 4 infected
4 D sample1 2 infected
5 A sample2 3 uninfected
6 B sample2 5 uninfected
7 C sample2 4 uninfected
8 D sample2 5 uninfected
Upvotes: 3