Reputation: 627
Can someone help me with scipy.stats.chisquare? I do not have a statistical / mathematical background, and I am learning scipy.stats.chisquare with this data set from https://en.wikipedia.org/wiki/Chi-squared_test
The Wikipedia article gives the table below as an example, stating the Chi-squared value based on it is approximately 24.6. I am to use scipy.stats to verify this value and calculate the associated p value.
I have found what looks like the most likely formula solutions to help me here
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html
As I am new to statistics, and also the use of scipy.stats.chisquare I am just not sure of the best approach, and how best to enter the data from provided table into the arrays, and whether to supply expected values? from Wikipedia.
Upvotes: 7
Views: 14517
Reputation: 114811
That data is a contingency table. SciPy has the function scipy.stats.chi2_contingency
that applies the chi-square test to a contingency table. It is fundamentally just a reqular chi-square test, but when applied to a contingency table, the expected frequencies are calculated under the assumption of independence (chi2_contingency
does this for you), and the degrees of freedom depends on the number of rows and columns (chi2_contingency
calculates this for you, too).
Here's how you can apply the chi-square test to that table:
import numpy as np
from scipy.stats import chi2_contingency
table = np.array([[90, 60, 104, 95],
[30, 50, 51, 20],
[30, 40, 45, 35]])
chi2, p, dof, expected = chi2_contingency(table)
print(f"chi2 statistic: {chi2:.5g}")
print(f"p-value: {p:.5g}")
print(f"degrees of freedom: {dof}")
print("expected frequencies:")
print(expected)
Output:
chi2 statistic: 24.571
p-value: 0.00040984
degrees of freedom: 6
expected frequencies:
[[ 80.53846154 80.53846154 107.38461538 80.53846154]
[ 34.84615385 34.84615385 46.46153846 34.84615385]
[ 34.61538462 34.61538462 46.15384615 34.61538462]]
Upvotes: 14