Reputation: 704
I am trying to run a Mann-Whitney test across large data set. Here is an excerpt of my input:
GeneID GeneID-2 GeneName TSS-ID Locus-ID TAp73fTfTAAdEmp TAp73fTfTFAdEmp TAp73fTfTJAdEmp TAp73fTfTAAdCre TAp73fTfTFAdCre TAp73fTfTJAdCre ENSMUSG00000028180 ENSMUSG00000028180 Zranb2 TSS1050,TSS17719,TSS52367,TSS53246,TSS72833,TSS73222 3:157534159-157548390 11.32013333 11.66344 11.87956667 13.01974667 14.70944667 10.94043867 ENSMUSG00000028184 ENSMUSG00000028184 Lphn2 TSS23298,TSS2403,TSS74519 3:148815585-148989316 15.0983 15.09572 14.03578667 17.00742667 17.90735333 14.69675333 ENSMUSG00000028187 ENSMUSG00000028187 Rpf1 TSS66485 3:146506347-146521423 12.34542667 14.11470667 10.493766 14.57954 11.93746667 11.07405867 ENSMUSG00000028189 ENSMUSG00000028189 Ctbs TSS36674,TSS72417 3:146450469-146465849 1.288003867 1.435658 1.959620667 1.427768 1.502116667 1.243928267 ENSMUSG00000020755 ENSMUSG00000020755 Sap30bp TSS14892,TSS218,TSS54781,TSS58430 11:115933281-115966725 31.91070667 31.68585333 26.86939333 39.05116667 30.62916667 27.22893333 ENSMUSG00000020752 ENSMUSG00000020752 Recql5 TSS26689,TSS42686,TSS60902,TSS75513,TSS9111 11:115892594-115933477 10.55415467 9.373216667 8.315984 7.255579333 7.022178 8.553787333 ENSMUSG00000020758 ENSMUSG00000020758 Itgb4 TSS23937,TSS28540,TSS29211,TSS34600,TSS36953,TSS4070,TSS6591,TSS68296 11:115974708-116008412 130.2124 117.3862 129.323 134.1108667 134.8743333 165.3330667 ENSMUSG00000069833 ENSMUSG00000069833 Ahnak TSS54612 19:8989283-9076919 116.3223333 135.2628 130.1286 147.045 142.8164 127.2352 ENSMUSG00000033863 ENSMUSG00000033863 Klf9 TSS87300 19:23141225-23166911 23.23418667 27.46006 26.56143333 21.09004667 18.47022 16.63767333 ENSMUSG00000069835 ENSMUSG00000069835 Sat2 TSS71535,TSS9615 11:69622023-69623870 0.975045133 0.886760067 1.593631333 1.469496 1.2373384 1.292182733 ENSMUSG00000028233 ENSMUSG00000028233 Tgs1 TSS24151,TSS28446,TSS50213,TSS68499,TSS79096 4:3574874-3616619 4.221024667 4.212087333 4.160574 5.113266667 6.917347333 5.22148 ENSMUSG00000028232 ENSMUSG00000028232 Tmem68 TSS12134,TSS25773,TSS25778,TSS49743,TSS7797 4:3549040-3574853 4.048868 3.906129333 6.024607333 4.613682 6.292972 4.287184
I wrote the same script for t-test and it worked. However the replacing test by "wilcox" is giving me the error:
Error in wilcox.test.default(x[i, 1:3], x[i, 4:6], var.equal = TRUE) : 'x' must be numeric
My code is:
library(preprocessCore)
err <-file("err.Rout", open="wt")
sink(err, type="message")
x <- read.table("Data.txt", row.names=1, header=TRUE, sep="\t", na.strings="NA")
x<-x[,5:ncol(x)]
p<-matrix(0,nrow(x),3)
for (i in 1:nrow(x)) {
myTest <- try(wilcox.test(x[i,1:3], x[i,4:6], var.equal=TRUE))
if (inherits(myTest, "try-error"))
{ p[i,2]=1 }
else
{p[i,2]=myTest$p.value; num=rowMeans(x[i,1:3], na.rm = FALSE); den=rowMeans(x[i,4:6], na.rm = FALSE); ratio=num/den; p[i,1]=ratio }
}
p[,3] = p.adjust(p[,2], method="none")
colnames(p) <- c("FoldChange", "p-value", "Adjusted-p")
write.table(p, file = "tmpPval-fold.txt", append = FALSE, quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
sink()
I'd appreciate your help in this matter. As i said, it worked perfectly if I use test instead of 'wilcox'.
Upvotes: 0
Views: 1978
Reputation: 263451
There are (at least) two problems with your code at the moment, one of them is the cause of that error. The class of the object returned by x[i,1:3]
is data.frame which is a list object and fails the is.numeric
test inside wilcox.test
. Try coercing:
wilcox.test(as.numeric(x[1,(1:3)]), as.numeric(x[1,(4:6)]), var.equal=TRUE)
But what-the-F is var.equal
doing in a call to a non-parametric test that will not have any assumption of equal variance? (Actually it is getting ignored is what is happening.) And how do you expect to be getting useful information from a test when you're only giving 3 items compared to 3 items. That is never giving to be "significant" or even particularly informative. I doubt that a t.test
could be informative when it is 3 vs 3 but a non-parametric test that is based on ordering of values is going to be even less likely to give a statistical signal of "significance".
Upvotes: 1