Jakob37
Jakob37

Reputation: 77

Tidyverse gather with rowdata from other data frame

I have been searching for quite some time to an elegant solution to this problem, to no avail. So I decided to give it a go here.

I am using tidyverse, and the gather function to convert a matrix containing intensity values from different samples into long format in preparation for plotting with ggplot.

There are two types of annotation. 'Row-based' annotation of the data, corresponding to genes, and 'column-based' annotation corresponding to sample information. The column based information is stored in a separate dataframe.

Using gather it is easy to prepare the values and row-based annotations to long format.

> df <- data.frame(annot=c("A", "B", "C", "D"), sample1=c(1,1,4,2), sample2=c(3,5,4,5))
> df
  annot sample1 sample2
1     A       1       3
2     B       1       5
3     C       4       4
4     D       2       5
> df %>% gather(sample, value, -annot)
  annot  sample value
1     A sample1     1
2     B sample1     1
3     C sample1     4
4     D sample1     2
5     A sample2     3
6     B sample2     5
7     C sample2     4
8     D sample2     5

The sample-information is more tricky. It is stored in a separate data frame:

> sample_info <- data.frame(sample=c("sample1", "sample2"), condition=c("infected", "uninfected"))
> sample_info
   sample  condition
1 sample1   infected
2 sample2 uninfected

The desired end result would look like the following:

  annot  sample value condition
1     A sample1     1 infected
2     B sample1     1 infected
3     C sample1     4 infected
4     D sample1     2 infected
5     A sample2     3 uninfected
6     B sample2     5 uninfected
7     C sample2     4 uninfected
8     D sample2     5 uninfected

I am able to achieve this by post-processing of the data frame where I map sample-name to condition row by row after generating the long data frame. I am looking for a neater solution, ideally using the tidyverse package. Do anyone know an elegant way to achieve this?

Upvotes: 1

Views: 144

Answers (1)

C. Braun
C. Braun

Reputation: 5201

The *_join functions from dplyr (loaded with tidyverse) are great for solving lots of problems involving more than one dataframe.

> df %>%
      gather(sample, value, -annot) %>%
      left_join(sample_info, by = 'sample')

  annot  sample value  condition
1     A sample1     1   infected
2     B sample1     1   infected
3     C sample1     4   infected
4     D sample1     2   infected
5     A sample2     3 uninfected
6     B sample2     5 uninfected
7     C sample2     4 uninfected
8     D sample2     5 uninfected

Upvotes: 3

Related Questions