Reputation: 898
I want to test if counts of A are greater than counts of B. I'm trying to use the fisher exact test but it gives me different results depending on how do I arrange the data. I dont know if the problem comes from this particular dataset (too many zeros) or if it comes from the way they are arranged.
First, i tried constructing a contingency table (m) as explained in the internet.
factor
counts A B
0 205 226
1 33 29
2 15 18
3 13 8
4 4 2
5 5 1
6 3 0
7 2 0
9 1 0
12 2 0
23 1 0
fisher.test(m, workspace = 200000, hybrid = FALSE,
control = list(), or = 1,
alternative = "two.sided",
conf.int = TRUE,
conf.level = 0.95,
simulate.p.value = T, B = 2000)
#results: data: m pvalue = 0.1184 alternative hypothesis: two.sided
This gives me insignificant differences, which is totally unexpected when looking at the data and table. The dataset is too big and complicated to post here or simulate, but i can send it to anyone interested.
However, if i create a matrix of the contingency table...
classes=c(0,1,2,3,4,5,6,7,9,12,23)
A=c(205,33,15,13,4,5,3,2,1,2,1)
B=c(226,29,18,8,2,1,0,0,0,0,0)
m=as.matrix(data.frame(classes,A,B))
fisher.test(m, workspace = 200000, hybrid = FALSE,
control = list(), or = 1,
alternative = "two.sided",
conf.int = TRUE,
conf.level = 0.95,
simulate.p.value = T, B = 2000)
#results: data: m p-value = 0.0004998 alternative hypothesis: two.sided
Which would be the correct procedure? if its the first, how is that possible that such big differences are not significant??
Thanks
Upvotes: 2
Views: 2148
Reputation: 263481
That first item may well be an R contingency table (which is really a matrix in disguise) so that first "column" is actually a bunch of rownames. When I make a data.frame with those rownames and coerce to a matrix and pass to fisher.test
I get the same result, as when I make a matrix ... without the extra column:
m=matrix( cbind(A,B),,2)
rownames(m)=classes
> m
[,1] [,2]
0 205 226
1 33 29
2 15 18
3 13 8
4 4 2
5 5 1
6 3 0
7 2 0
9 1 0
12 2 0
23 1 0
> as.matrix(d)
A B
0 205 226
1 33 29
2 15 18
3 13 8
4 4 2
5 5 1
6 3 0
7 2 0
9 1 0
12 2 0
23 1 0
> fisher.test( as.matrix(d) )
Fisher's Exact Test for Count Data
data: as.matrix(d)
p-value = 0.1197
alternative hypothesis: two.sided
> fisher.test(m)
Fisher's Exact Test for Count Data
data: m
p-value = 0.1197
alternative hypothesis: two.sided
Please clarify your statistical (mis?)-understanding on this matter with your professor or the folks at CV.com. The minor numerical difference between your p-value and the two ones I showed is because you insisted on the "real" exact test in the first instance. Part of the loss of power to detect what we both suspect should be a statistically significant difference was the long tail of those distributions with small numbers. It gets handled improperly by the fisher.test. Furthermore the statistical power is diminished by the extra degrees of freedom. You would get more power by testing this as two exponential variates .... but that, too, is a matter for statistical discussion.
Upvotes: 1