rnso
rnso

Reputation: 24565

Why is there no variation in output of cochrans Q test from statsmodel library in this code?

I am using following code:

from statsmodels.stats.contingency_tables import cochrans_q 
res = cochrans_q([[1,4,5],[9,6,8]])
print(res)

The output is:

df          2
pvalue      0.36787944117144245
statistic   2.0

However, output remains same for [[10,4,5],[9,6,8]], [[55,88,77],[99,46,88]] etc.

The statsmodels documentation page is here and the Wikipedia page on Cochran's Q test is here.

Where is the problem and how can it be solved? Thanks for your help.

Upvotes: 0

Views: 578

Answers (2)

Josef
Josef

Reputation: 22897

cochrans_q takes binary data and not counts.

In statsmodels the documentation is often not very explicit, but the expected behavior can be seen from the unit tests.

The following unit test shows how to convert frequency data to the format required by statsmodels.

source: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/stats/tests/test_nonparametric.py#L190

def test_cochransq3():
    # another example compared to SAS
    # in frequency weight format
    dt = [('A', 'S1'), ('B', 'S1'), ('C', 'S1'), ('count', int)]
    dta = np.array([('F', 'F', 'F', 6),
                    ('U', 'F', 'F', 2),
                    ('F', 'F', 'U', 16),
                    ('U', 'F', 'U', 4),
                    ('F', 'U', 'F', 2),
                    ('U', 'U', 'F', 6),
                    ('F', 'U', 'U', 4),
                    ('U', 'U', 'U', 6)], dt)

    cases = np.array([[0, 0, 0],
                      [1, 0, 0],
                      [0, 0, 1],
                      [1, 0, 1],
                      [0, 1, 0],
                      [1, 1, 0],
                      [0, 1, 1],
                      [1, 1, 1]])
    count = np.array([ 6,  2, 16,  4,  2,  6,  4,  6])
    data = np.repeat(cases, count, 0)

    res = cochrans_q(data)
    assert_allclose([res.statistic, res.pvalue], [8.4706, 0.0145], atol=5e-5)

Upvotes: 1

Red
Red

Reputation: 27577

Here are the definitions of each result:


df: In statistics, the degrees of freedom (DF) indicate the number of independent values that can vary in an analysis without breaking any constraints. It is an essential idea that appears in many contexts throughout statistics including hypothesis tests, probability distributions, and regression analysis.


pvalue: In statistics, the p-value is the probability of obtaining results as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct.


statistic: A statistic (singular) or sample statistic is any quantity computed from values in a sample that is used for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypothesis.


As you can see, the definitions have nothing to do with values of each element, it's the quantity that counts.

Your program retursn the same thing for
[[1,4,5],[9,6,8]], [[10,4,5],[9,6,8]], [[55,88,77],[99,46,88]], etc
because each list has 3 elements in them.

Cochran's Q Test is a non parametric way to find differences in matched sets of three or more frequencies or proportions. That means that when performing the Cochran's Q test, the values of each element don't matter.

Upvotes: 1

Related Questions