Reputation: 21

R: obtaining p-values from t.test for each gene

I'm using the Bioconductor suite (ALL dataset) and trying to conduct a t.test for each gene. The goal is to view gene expression differences between the sexes. I can get the basic t.test down with the following:

> males <- exprs[, pData(ALL)$sex == "M"]
> females<-exprs[, pData(ALL)$sex == "F"]
> t.test(males, females)

But when I try the apply function to extract a p-value for each gene the command never ends, just keeps going in an endless loop (I think).

pvals=apply(exprs,1,function(x) {t.test(x[males],x[females])$p.value})

here is a sample of the males, there are 12625 rows (i.e., probe IDs).

> males
                                01005     01010     04006     04007     04008
1000_at                      7.597323  7.479445  7.384684  7.905312  7.065914
1001_at                      5.046194  4.932537  4.922627  4.844565  5.147762
1002_f_at                    3.900466  4.208155  4.206798  3.416923  3.945869
1003_s_at                    5.903856  6.169024  6.116890  5.687997  6.208061

Upvotes: 0

Answers (3)

Karolis Koncevičius

Reputation: 9656

If you are allowed to use an external package then:

library(matrixTests)
row_t_welch(exprs[, pData(ALL)$sex == "M"], exprs[, pData(ALL)$sex == "F"])

This is assuming the genes are written in rows.

Upvotes: 0

Oars

Reputation: 21

Thanks to Maurits I was able to use his code to answer my question. I also developed the following snipped that accomplished the task (I actually like Maurits's solution better but here is another way to complete the task:

> exprs<-exprs(ALL)
> pval<-numeric()
> p.dat<-pData(ALL)$sex
> r.sims<-nrow(exprs)
> for(gene in 1:r.sims) { 
+ gexprs<-exprs[gene,]
+ g.data<-data.frame(gexprs,p.dat)
+ ttest<-t.test(gexprs[p.dat=="M"],gexprs[p.dat=="F"])
+ pval[gene]<-ttest$p.value
+ }

Upvotes: 0

Maurits Evers

Reputation: 50668

Here is something to get you started. (At the risk of repeating myself;-) please note that this is more of a statistics/computational exercise than something you really should be doing; as explained in my comment, there exist sophisticated methods to characterise differential gene expression. A t-test (or ANOVA) is a very crude method in comparison.

We load the ALL library and data.
```
library(ALL)
data(ALL)
```
To characterise differences in mean probe intensities between male and female individuals we perform a two-sample two-sided t-test and store results in a list.
```
lst <- apply(exprs(ALL), 1, function(x)
    t.test(x[which(pData(ALL)$sex == "M")], x[which(pData(ALL)$sex == "F")]))
```

We extract the per-probe t statistic, difference in mean probe intensity and p-value, and store results in a a data.frame.

df <- do.call(rbind, lapply(lst, function(x) c(
    statistic = unname(x$statistic),
    diff = unname(diff(x$estimate)),
    pval = unname(x$p.value))))

We correct p-values for multiple hypothesis testing using the FDR method of Benjamini and Hochberg.
```
df <- transform(df, padj = p.adjust(pval, method = "BH"))
```

We inspect the first 10 rows of df sorted from smallest to largest adjusted p-value.

head(df[order(df$padj), ], n = 10)
#        statistic       diff         pval         padj
#37583_at  18.935092 -1.7717178 1.710570e-36 2.159594e-32
#38355_at  20.542586 -4.9979077 6.129942e-32 3.869526e-28
#41214_at  21.494496 -4.3233221 3.937217e-31 1.656912e-27
#34477_at  14.469711 -1.1639971 2.606867e-28 8.227924e-25
#35885_at  14.417265 -1.4006757 5.806146e-28 1.466052e-24
#38446_at -14.357159  2.3848176 1.956173e-21 4.116115e-18
#38182_at  11.052181 -0.7151076 1.140089e-19 2.056232e-16
#40097_at   9.401626 -0.5798433 8.801566e-16 1.388997e-12
#36321_at   9.208492 -0.6499951 1.823511e-15 2.557981e-12
#31534_at   8.939350 -0.5113203 1.077008e-14 1.359723e-11

We show results in a volcano plot

ggplot(df, aes(diff, -log10(padj))) +
    geom_point() +
    labs(x = "Difference in mean probe intensity", y = "Adjusted p-value")

Upvotes: 1

R: obtaining p-values from t.test for each gene

Answers (3)

Related Questions