montyman14
montyman14

Reputation: 65

Trouble Deciding How to Test for Variance in Bulk RNA sequencing Data

I have some bulk-RNA sequencing data that I need to do differential expression significance testing on. I have two conditions, WT and KO, with two replicates each, giving me a dataframe that looks like the following (the columns are in counts):

       WT1   WT2   KO1   KO2
 gene1 1.3   1.23  3.42  3.45
 gene2 2.6   2.54  1.22  1.21
 gene3 5.54  2.32  1.21  1.10 

My questions are, how do I get a column on the right with a p-value for each gene so that I can construct a Volcano plot of the data? Basically, what statistical test do I need to use to generate that column, and what function do I use in R to do so? I'm sorry if this isn't technically a question that I'm supposed to ask here, but frankly I didn't know where else to post. Thanks in advance!

Upvotes: 0

Views: 44

Answers (1)

montyman14
montyman14

Reputation: 65

just in case someone ends up caring about this question and I'm not just screaming into the ether (per the usual), I figured this out. Basically, for this kind of data I need to use either a one-way ANOVA test or a two-tailed t-test, which basically end up being the same thing (at least in this case). I decided to go with the t.test() function in R, as it's a little bit easier to understand (at least if you're not super familiar with statistics in R). Normally, the t.test function produces a summary that looks like this:

 Welch Two Sample t-test

 data:  bulk_data[1, 1:2] and bulk_data[1, 3:4]
 t = -0.93364, df = 1.1978, p-value = 0.5002
 alternative hypothesis: true difference in means is not equal to 0
 95 percent confidence interval:
  -0.3807992  0.3068266
 sample estimates:
  mean of x  mean of y 
 0.09525708 0.13224335 

I needed to remove the p-value object and add it to the fifth column of the data frame, so I used this loop:

  for (i in 1:nrow(bulk_data)) {
   t <- t.test(x = bulk_data[i, 1:2], y = bulk_data[i, 3:4], alternative = "two.sided")
   bulk_data[i, 5] <- t$p.value
  }

This gave me a very nice list of p-values in the fifth column.

Upvotes: 0

Related Questions