Agus camacho
Agus camacho

Reputation: 898

Fisher exact test does not give expected results in R

I want to test if counts of A are greater than counts of B. I'm trying to use the fisher exact test but it gives me different results depending on how do I arrange the data. I dont know if the problem comes from this particular dataset (too many zeros) or if it comes from the way they are arranged.

First, i tried constructing a contingency table (m) as explained in the internet.

       factor
 counts     A       B
      0   205       226
      1    33        29
      2    15        18
      3    13         8
      4     4         2
      5     5         1
      6     3         0
      7     2         0
      9     1         0
      12    2         0
      23    1         0

fisher.test(m, workspace = 200000, hybrid = FALSE,
control = list(), or = 1, 
alternative = "two.sided",
conf.int = TRUE, 
conf.level = 0.95,
simulate.p.value = T, B = 2000)    
#results: data:  m  pvalue = 0.1184    alternative hypothesis: two.sided

This gives me insignificant differences, which is totally unexpected when looking at the data and table. The dataset is too big and complicated to post here or simulate, but i can send it to anyone interested.

However, if i create a matrix of the contingency table...

classes=c(0,1,2,3,4,5,6,7,9,12,23)
A=c(205,33,15,13,4,5,3,2,1,2,1)
B=c(226,29,18,8,2,1,0,0,0,0,0)
m=as.matrix(data.frame(classes,A,B))
fisher.test(m, workspace = 200000, hybrid = FALSE,
control = list(), or = 1, 
alternative = "two.sided",
conf.int = TRUE, 
conf.level = 0.95,
simulate.p.value = T, B = 2000)
#results: data:  m p-value = 0.0004998 alternative hypothesis: two.sided 

Which would be the correct procedure? if its the first, how is that possible that such big differences are not significant??

Thanks

Upvotes: 2

Views: 2148

Answers (1)

IRTFM
IRTFM

Reputation: 263481

That first item may well be an R contingency table (which is really a matrix in disguise) so that first "column" is actually a bunch of rownames. When I make a data.frame with those rownames and coerce to a matrix and pass to fisher.test I get the same result, as when I make a matrix ... without the extra column:

m=matrix( cbind(A,B),,2)
rownames(m)=classes

> m
   [,1] [,2]
0   205  226
1    33   29
2    15   18
3    13    8
4     4    2
5     5    1
6     3    0
7     2    0
9     1    0
12    2    0
23    1    0

> as.matrix(d)
     A   B
0  205 226
1   33  29
2   15  18
3   13   8
4    4   2
5    5   1
6    3   0
7    2   0
9    1   0
12   2   0
23   1   0
> fisher.test( as.matrix(d) )

    Fisher's Exact Test for Count Data

data:  as.matrix(d)
p-value = 0.1197
alternative hypothesis: two.sided

> fisher.test(m)

    Fisher's Exact Test for Count Data

data:  m
p-value = 0.1197
alternative hypothesis: two.sided

Please clarify your statistical (mis?)-understanding on this matter with your professor or the folks at CV.com. The minor numerical difference between your p-value and the two ones I showed is because you insisted on the "real" exact test in the first instance. Part of the loss of power to detect what we both suspect should be a statistically significant difference was the long tail of those distributions with small numbers. It gets handled improperly by the fisher.test. Furthermore the statistical power is diminished by the extra degrees of freedom. You would get more power by testing this as two exponential variates .... but that, too, is a matter for statistical discussion.

Upvotes: 1

Related Questions