Tayeb Mohammadi
Tayeb Mohammadi

Reputation: 1

compute Chi-Square (χ2) Statistic from comparing different Expected Frequencies

I ran into a problem and I hope someone can help me.

Observed<-matrix(c(1,2,3,4,5,6,7,8,9,249,454,54,22,3,6,2),ncol=2, byrow = F)

enter image description here

Expected<-matrix(c(1,2,3,4,5,6,8,284,358,123,17,4),ncol=2, byrow = F)

enter image description here

I have two matrices similar to the above, in which the first column is the numerical values and the second column is their frequencies. I want to merge the second column of each matrix so that their numerical values are the same so that I can compute Chi-Square (χ2) Statistic.
Actually, I mean that the resulting matrix should be as follows:

enter image description here

I repeat this several times and actually want to compare my expected frequency with the observed frequency each time.

Upvotes: 0

Views: 179

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76641

Use merge to join the matrices and substitute 0's for the NA's.

Observed<-matrix(c(1,2,3,4,5,6,7,8,9,249,454,54,22,3,6,2),ncol=2, byrow = F)
Expected<-matrix(c(1,2,3,4,5,6,8,284,358,123,17,4),ncol=2, byrow = F)

merge(Expected, Observed, by = "V1", all = TRUE)[-1] -> res
res[] <- lapply(res, \(x) ifelse(is.na(x), 0, x))
names(res) <- c("Expected", "Observed")
res
#>   Expected Observed
#> 1        8        9
#> 2      284      249
#> 3      358      454
#> 4      123       54
#> 5       17       22
#> 6        4        3
#> 7        0        6
#> 8        0        2

Created on 2022-10-20 with reprex v2.0.2

But with expected counts of zero, the divisor in the chi-squared statistic is zero and the statistic is infinity:

sum((res$Observed - res$Expected)^2/res$Expected)
#[1] Inf

So don't do a full join, a left join is the right one.
The first test is computed by hand following the original Pearson formula, see here. The other two are R's chisq.test result without and with simulated p-values.

merge(Expected, Observed, by = "V1")[-1] -> res2
names(res2) <- c("Expected", "Observed")

chisq <- sum((res2$Observed - res2$Expected)^2/res2$Expected)
chisq
#> [1] 70.6093
df <- nrow(res2) - 1L
pchisq(chisq, df, lower.tail = FALSE)
#> [1] 7.652696e-14

chisq.test(res2)
#> Warning in chisq.test(res2): Chi-squared approximation may be incorrect
#> 
#>  Pearson's Chi-squared test
#> 
#> data:  res2
#> X-squared = 41.384, df = 5, p-value = 7.849e-08

chisq.test(res2, simulate.p.value = TRUE, B = 2000)
#> 
#>  Pearson's Chi-squared test with simulated p-value (based on 2000
#>  replicates)
#> 
#> data:  res2
#> X-squared = 41.384, df = NA, p-value = 0.0004998

Created on 2022-10-20 with reprex v2.0.2

Upvotes: 1

Related Questions