Reputation: 1
I ran into a problem and I hope someone can help me.
Observed<-matrix(c(1,2,3,4,5,6,7,8,9,249,454,54,22,3,6,2),ncol=2, byrow = F)
Expected<-matrix(c(1,2,3,4,5,6,8,284,358,123,17,4),ncol=2, byrow = F)
I have two matrices similar to the above, in which the first column is the numerical values and the second column is their frequencies. I want to merge the second column of each matrix so that their numerical values are the same so that I can compute Chi-Square (χ2) Statistic.
Actually, I mean that the resulting matrix should be as follows:
I repeat this several times and actually want to compare my expected frequency with the observed frequency each time.
Upvotes: 0
Views: 179
Reputation: 76641
Use merge
to join the matrices and substitute 0's for the NA
's.
Observed<-matrix(c(1,2,3,4,5,6,7,8,9,249,454,54,22,3,6,2),ncol=2, byrow = F)
Expected<-matrix(c(1,2,3,4,5,6,8,284,358,123,17,4),ncol=2, byrow = F)
merge(Expected, Observed, by = "V1", all = TRUE)[-1] -> res
res[] <- lapply(res, \(x) ifelse(is.na(x), 0, x))
names(res) <- c("Expected", "Observed")
res
#> Expected Observed
#> 1 8 9
#> 2 284 249
#> 3 358 454
#> 4 123 54
#> 5 17 22
#> 6 4 3
#> 7 0 6
#> 8 0 2
Created on 2022-10-20 with reprex v2.0.2
But with expected counts of zero, the divisor in the chi-squared statistic is zero and the statistic is infinity:
sum((res$Observed - res$Expected)^2/res$Expected)
#[1] Inf
So don't do a full join, a left join is the right one.
The first test is computed by hand following the original Pearson formula, see here. The other two are R's chisq.test
result without and with simulated p-values.
merge(Expected, Observed, by = "V1")[-1] -> res2
names(res2) <- c("Expected", "Observed")
chisq <- sum((res2$Observed - res2$Expected)^2/res2$Expected)
chisq
#> [1] 70.6093
df <- nrow(res2) - 1L
pchisq(chisq, df, lower.tail = FALSE)
#> [1] 7.652696e-14
chisq.test(res2)
#> Warning in chisq.test(res2): Chi-squared approximation may be incorrect
#>
#> Pearson's Chi-squared test
#>
#> data: res2
#> X-squared = 41.384, df = 5, p-value = 7.849e-08
chisq.test(res2, simulate.p.value = TRUE, B = 2000)
#>
#> Pearson's Chi-squared test with simulated p-value (based on 2000
#> replicates)
#>
#> data: res2
#> X-squared = 41.384, df = NA, p-value = 0.0004998
Created on 2022-10-20 with reprex v2.0.2
Upvotes: 1