Christopher
Christopher

Reputation: 627

understanding scipy.stats.chisquare

Can someone help me with scipy.stats.chisquare? I do not have a statistical / mathematical background, and I am learning scipy.stats.chisquare with this data set from https://en.wikipedia.org/wiki/Chi-squared_test

The Wikipedia article gives the table below as an example, stating the Chi-squared value based on it is approximately 24.6. I am to use scipy.stats to verify this value and calculate the associated p value.

enter image description here

I have found what looks like the most likely formula solutions to help me here

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html

enter image description here

As I am new to statistics, and also the use of scipy.stats.chisquare I am just not sure of the best approach, and how best to enter the data from provided table into the arrays, and whether to supply expected values? from Wikipedia.

Upvotes: 7

Views: 14517

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114811

That data is a contingency table. SciPy has the function scipy.stats.chi2_contingency that applies the chi-square test to a contingency table. It is fundamentally just a reqular chi-square test, but when applied to a contingency table, the expected frequencies are calculated under the assumption of independence (chi2_contingency does this for you), and the degrees of freedom depends on the number of rows and columns (chi2_contingency calculates this for you, too).

Here's how you can apply the chi-square test to that table:

import numpy as np
from scipy.stats import chi2_contingency


table = np.array([[90, 60, 104, 95],
                  [30, 50,  51, 20],
                  [30, 40,  45, 35]])

chi2, p, dof, expected = chi2_contingency(table)

print(f"chi2 statistic:     {chi2:.5g}")
print(f"p-value:            {p:.5g}")
print(f"degrees of freedom: {dof}")
print("expected frequencies:")
print(expected)

Output:

chi2 statistic:     24.571
p-value:            0.00040984
degrees of freedom: 6
expected frequencies:
[[ 80.53846154  80.53846154 107.38461538  80.53846154]
 [ 34.84615385  34.84615385  46.46153846  34.84615385]
 [ 34.61538462  34.61538462  46.15384615  34.61538462]]

Upvotes: 14

Related Questions