AdrianP.
AdrianP.

Reputation: 443

Standard Chi Squared Test in R?

I have samples of observation counts for 4 genotypes in a single copy region. What I want to do, is calculate the allele frequencies of these genotypes, and then test of these frequencies deviate significantly from expected values of 25%:25%:25%:25% using Chi Squared in R.

So far, I got:

> a <- c(do.call(rbind, strsplit(as.character(gdr18[1,9]), ",")), as.character(gdr18[1,8]))
> a
[1] "27" "30" "19" "52"

Next I get total count:

> sum <- as.numeric(a[1]) + as.numeric(a[2]) + as.numeric(a[3]) + as.numeric(a[4])
> sum
[1] 128

Now frequencies:

> af1 <- as.numeric(a[1])/sum
> af2 <- as.numeric(a[2])/sum
> af3 <- as.numeric(a[3])/sum
> af4 <- as.numeric(a[4])/sum
> af1
[1] 0.2109375
> af2
[1] 0.234375
> af3
[1] 0.1484375
> af4
[1] 0.40625

Here I am lost now. I want to know if af1, af2, af3 and af4 deviate significantly from 0.25, 0.25, 0.25 and 0.25

How do I do this in R?

Thank you, Adrian

EDIT:

Alright, I am trying chisq.test() as suggested:

> p <- c(0.25,0.25,0.25,0.25)
> chisq.test(af, p=p)

        Chi-squared test for given probabilities

data:  af
X-squared = 0.146, df = 3, p-value = 0.9858

Warning message:
In chisq.test(af, p = p) : Chi-squared approximation may be incorrect

What is the warning message trying to tell me? Why would the approximation be incorrect?

To test this methodology, I picked values far from expected 0.25:

> af=c(0.001,0.200,1.0,0.5)
> chisq.test(af, p=p)

        Chi-squared test for given probabilities

data:  af
X-squared = 1.3325, df = 3, p-value = 0.7214

Warning message:
In chisq.test(af, p = p) : Chi-squared approximation may be incorrect

In this case the H0 is still not rejected, even though the values are pretty far off from the expected 0.25 values.

Upvotes: 0

Views: 1851

Answers (1)

Yorgos
Yorgos

Reputation: 30445

observed <- c(27,30,19,52)
chisq.test(observed)

which indicates that such frequencies or more extreme than this would arise by chance alone about 0.03% of the time (p = 0.0003172).

If your null hypothesis is not a 25:25:25:25 distribution across the four categories, but say that the question was whether these data depart significantly from the 3:3:1:9 expectation, you need to calculate the expected frequencies explicitly:

expected <- sum(observed)*c(3,3,1,9)/16

chisq.test(observed,p=c(3,3,1,9),rescale.p=TRUE)

Upvotes: 3

Related Questions