Reputation: 11
I have a csv with several thousands of samples whose gene expression after different treatments should be compared:
ID U1 U2 U3 H1 H2 H3
1 5.95918 6.07211 6.01437 5.89113 5.89776 5.95443
2 6.56789 5.98897 6.67844 5.78987 6.01789 6.12789
..
I was asked to do a Mann Whitney u test and R is giving me results when I use this:
results <- apply(data,1,function(x){wilcox.test(x[1:3],x[4:6])$pvalue})
However, I just get values like 0.1 or 0.5..
When I added alternative ="greater"
I got values like 0.35000 or 0.05000 and a few samples got pvalues like 0.14314 (that's a value I am okay with).
So I am wondering why R is giving me such strange pvalues (0.35000,..) and how I can fix it to get "normal" pvalues.
Upvotes: 1
Views: 1967
Reputation: 132959
You are doing a non-parametric test, where the test statistics is derived from the ranks. With a sample size of 3, there are just a few possible distinct values for the test statistics.
Example:
set.seed(42)
x <- matrix(rnorm(3000), ncol=6)
ps <- apply(x, 1, function(a) wilcox.test(a[1:3], a[4:6])$p.value)
table(ps)
#ps
#0.1 0.2 0.4 0.7 1
# 54 45 108 141 152
Upvotes: 5