BioProgram
BioProgram

Reputation: 704

Mann-Whitney-Wilcoxon test in R giving Error

I am trying to run a Mann-Whitney test across large data set. Here is an excerpt of my input:

GeneID  GeneID-2    GeneName    TSS-ID  Locus-ID    TAp73fTfTAAdEmp TAp73fTfTFAdEmp TAp73fTfTJAdEmp TAp73fTfTAAdCre TAp73fTfTFAdCre TAp73fTfTJAdCre
ENSMUSG00000028180  ENSMUSG00000028180  Zranb2  TSS1050,TSS17719,TSS52367,TSS53246,TSS72833,TSS73222    3:157534159-157548390   11.32013333 11.66344    11.87956667 13.01974667 14.70944667 10.94043867
ENSMUSG00000028184  ENSMUSG00000028184  Lphn2   TSS23298,TSS2403,TSS74519   3:148815585-148989316   15.0983 15.09572    14.03578667 17.00742667 17.90735333 14.69675333
ENSMUSG00000028187  ENSMUSG00000028187  Rpf1    TSS66485    3:146506347-146521423   12.34542667 14.11470667 10.493766   14.57954    11.93746667 11.07405867
ENSMUSG00000028189  ENSMUSG00000028189  Ctbs    TSS36674,TSS72417   3:146450469-146465849   1.288003867 1.435658    1.959620667 1.427768    1.502116667 1.243928267
ENSMUSG00000020755  ENSMUSG00000020755  Sap30bp TSS14892,TSS218,TSS54781,TSS58430   11:115933281-115966725  31.91070667 31.68585333 26.86939333 39.05116667 30.62916667 27.22893333
ENSMUSG00000020752  ENSMUSG00000020752  Recql5  TSS26689,TSS42686,TSS60902,TSS75513,TSS9111 11:115892594-115933477  10.55415467 9.373216667 8.315984    7.255579333 7.022178    8.553787333
ENSMUSG00000020758  ENSMUSG00000020758  Itgb4   TSS23937,TSS28540,TSS29211,TSS34600,TSS36953,TSS4070,TSS6591,TSS68296   11:115974708-116008412  130.2124    117.3862    129.323 134.1108667 134.8743333 165.3330667
ENSMUSG00000069833  ENSMUSG00000069833  Ahnak   TSS54612    19:8989283-9076919  116.3223333 135.2628    130.1286    147.045 142.8164    127.2352
ENSMUSG00000033863  ENSMUSG00000033863  Klf9    TSS87300    19:23141225-23166911    23.23418667 27.46006    26.56143333 21.09004667 18.47022    16.63767333
ENSMUSG00000069835  ENSMUSG00000069835  Sat2    TSS71535,TSS9615    11:69622023-69623870    0.975045133 0.886760067 1.593631333 1.469496    1.2373384   1.292182733
ENSMUSG00000028233  ENSMUSG00000028233  Tgs1    TSS24151,TSS28446,TSS50213,TSS68499,TSS79096    4:3574874-3616619   4.221024667 4.212087333 4.160574    5.113266667 6.917347333 5.22148
ENSMUSG00000028232  ENSMUSG00000028232  Tmem68  TSS12134,TSS25773,TSS25778,TSS49743,TSS7797 4:3549040-3574853   4.048868    3.906129333 6.024607333 4.613682    6.292972    4.287184

I wrote the same script for t-test and it worked. However the replacing test by "wilcox" is giving me the error:

Error in wilcox.test.default(x[i, 1:3], x[i, 4:6], var.equal = TRUE) : 
  'x' must be numeric

My code is:

library(preprocessCore)
err <-file("err.Rout", open="wt")
sink(err, type="message")
x <- read.table("Data.txt", row.names=1, header=TRUE, sep="\t",        na.strings="NA")
x<-x[,5:ncol(x)] 
p<-matrix(0,nrow(x),3)
for (i in 1:nrow(x)) { 
myTest <- try(wilcox.test(x[i,1:3], x[i,4:6], var.equal=TRUE))
if (inherits(myTest, "try-error"))
{ p[i,2]=1 } 
else 
{p[i,2]=myTest$p.value; num=rowMeans(x[i,1:3], na.rm = FALSE);    den=rowMeans(x[i,4:6], na.rm = FALSE); ratio=num/den; p[i,1]=ratio }
}
p[,3] = p.adjust(p[,2], method="none")
colnames(p) <- c("FoldChange", "p-value", "Adjusted-p")
write.table(p, file = "tmpPval-fold.txt", append = FALSE, quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
sink()

I'd appreciate your help in this matter. As i said, it worked perfectly if I use test instead of 'wilcox'.

Upvotes: 0

Views: 1978

Answers (1)

IRTFM
IRTFM

Reputation: 263451

There are (at least) two problems with your code at the moment, one of them is the cause of that error. The class of the object returned by x[i,1:3] is data.frame which is a list object and fails the is.numeric test inside wilcox.test. Try coercing:

wilcox.test(as.numeric(x[1,(1:3)]), as.numeric(x[1,(4:6)]), var.equal=TRUE)

But what-the-F is var.equal doing in a call to a non-parametric test that will not have any assumption of equal variance? (Actually it is getting ignored is what is happening.) And how do you expect to be getting useful information from a test when you're only giving 3 items compared to 3 items. That is never giving to be "significant" or even particularly informative. I doubt that a t.test could be informative when it is 3 vs 3 but a non-parametric test that is based on ordering of values is going to be even less likely to give a statistical signal of "significance".

Upvotes: 1

Related Questions