Reputation: 25
Essentially, I am trying to make a series of plots with log2 fold-change on the y-axis and mean counts on the y-axis (the observation is genes). These are commonly called MA plots. The issue I am having is getting my data into the right form. I can do this through a loop, but would like to know the right way to do it.
At this point, I have two data frames: my design matrix and my data matrix. The design matrix looks like so (call it ED_df):
SampleID Patient Grade Batch
MD48L_2_B_L1 MD48 G2 Feb15
MD48R_3_B_L1 MD48 G3 Feb15
MD53L_2_B_L1 MD53 G2 Feb15
MD53R_3_B_L1 MD53 G3 Feb15
MD58L_2_B_L1 MD58 G2 Sep15
MD58R_3_B_L1 MD58 G3 Sep15
dim(ED_df)
# [1] 18 6
Each row is a unique sample. Each sample comes from patient+grade+batch. In this case, all patients are paired around grade (G2 or G3). There are 3 total batches. Two patients were replicated across either batch 1 and 2 or batch 2 and 3.
My data matrix looks like so (call it data_df):
Gene MD48L_2_B_L1 MD48R_3_B_L1 MD53L_2_B_L1 MD53R_3_B_L1 MD58L_2_B_L1
1 ENSG00000000003 364.26079 329.28730 531.52188 371.67413 275.745038
2 ENSG00000000005 18.92264 49.89201 42.18428 19.42548 1.948728
3 ENSG00000000419 270.59373 261.65590 284.74386 414.41018 293.283591
4 ENSG00000000457 145.70432 125.28439 122.33440 129.50318 148.103342
dim(data_df)
# [1] 31707 18
Each column corresponds to a unique sample.
What I am wanting to do is to get, for each gene, a log2 fold-change (G3/G2) within each patient-batch set. Additionally, I want to get mean (G3, G2) for each patient-batch set.
I will then plot this as an MA plot.
Again, I can see how to do this painfully through a nested for-loop, what I would like to do is figure out how to do this via some sort of aggregating function.
Upvotes: 0
Views: 416
Reputation: 4024
Two more steps: spread grade so G2 and G3 end up in different columns, then summarize. I'm not sure if I exactly understood the aggregation process you wanted, but I've taken a stab. I included the psych package for the gm (geometric mean) function. This is important when working with ratio data.
library(dplyr)
library(tidyr)
library(psych)
data_df %>%
as.data.frame %>%
gather(sample, measurement, -gene) %>%
left_join(ED_df) %>%
spread(Grade, measurement) %>%
group_by(Patient, Batch) %>%
summarize(G2_geometric_mean = G2 %>% gm,
G3_geometric_mean = G3 %>% gm) %>%
mutate(geometric_mean_ratio = G3_geometric_mean / G2_geometric_mean)
Upvotes: 1