Reputation: 443
I have samples of observation counts for 4 genotypes in a single copy region. What I want to do, is calculate the allele frequencies of these genotypes, and then test of these frequencies deviate significantly from expected values of 25%:25%:25%:25% using Chi Squared in R.
So far, I got:
> a <- c(do.call(rbind, strsplit(as.character(gdr18[1,9]), ",")), as.character(gdr18[1,8]))
> a
[1] "27" "30" "19" "52"
Next I get total count:
> sum <- as.numeric(a[1]) + as.numeric(a[2]) + as.numeric(a[3]) + as.numeric(a[4])
> sum
[1] 128
Now frequencies:
> af1 <- as.numeric(a[1])/sum
> af2 <- as.numeric(a[2])/sum
> af3 <- as.numeric(a[3])/sum
> af4 <- as.numeric(a[4])/sum
> af1
[1] 0.2109375
> af2
[1] 0.234375
> af3
[1] 0.1484375
> af4
[1] 0.40625
Here I am lost now. I want to know if af1, af2, af3 and af4 deviate significantly from 0.25, 0.25, 0.25 and 0.25
How do I do this in R?
Thank you, Adrian
EDIT:
Alright, I am trying chisq.test() as suggested:
> p <- c(0.25,0.25,0.25,0.25)
> chisq.test(af, p=p)
Chi-squared test for given probabilities
data: af
X-squared = 0.146, df = 3, p-value = 0.9858
Warning message:
In chisq.test(af, p = p) : Chi-squared approximation may be incorrect
What is the warning message trying to tell me? Why would the approximation be incorrect?
To test this methodology, I picked values far from expected 0.25:
> af=c(0.001,0.200,1.0,0.5)
> chisq.test(af, p=p)
Chi-squared test for given probabilities
data: af
X-squared = 1.3325, df = 3, p-value = 0.7214
Warning message:
In chisq.test(af, p = p) : Chi-squared approximation may be incorrect
In this case the H0 is still not rejected, even though the values are pretty far off from the expected 0.25 values.
Upvotes: 0
Views: 1851
Reputation: 30445
observed <- c(27,30,19,52)
chisq.test(observed)
which indicates that such frequencies or more extreme than this would arise by chance alone about 0.03% of the time (p = 0.0003172).
If your null hypothesis is not a 25:25:25:25 distribution across the four categories, but say that the question was whether these data depart significantly from the 3:3:1:9 expectation, you need to calculate the expected frequencies explicitly:
expected <- sum(observed)*c(3,3,1,9)/16
chisq.test(observed,p=c(3,3,1,9),rescale.p=TRUE)
Upvotes: 3