mogcai
mogcai

Reputation: 25

Different results in KS-test from Scipy and statsmodels

My code:

from scipy import stats
import statsmodels.api as sm
data=[-0.032400000000000005,-0.0358,-0.035699999999999996,-0.029500000000000002,-0.0227,-0.0146,-0.0125,-0.0103,-0.0182,-0.0137,-0.021099999999999997,-0.0327,-0.0279,-0.0325,-0.0252,-0.015700000000000002,-0.0148,-0.013999999999999999,-0.0137,-0.013500000000000002,-0.0042,0.0044,0.0212,0.027999999999999997,0.036699999999999997,0.0447,0.0524,0.056100000000000004,0.0519,0.0571,0.0424,0.045899999999999996,0.0496,0.053,0.0594,0.0712,0.0949,0.09050000000000001,0.0907,0.0616,0.0235,0.011000000000000001,-0.0103,0.0075,0.018799999999999997,0.0268,0.0383,0.0392,0.0546,0.0565,0.06509999999999999,0.0681,0.0622,0.061900000000000004,0.056900000000000006,0.0583,0.0495,0.053099999999999994,0.0612,0.0572,0.0636,0.0599,0.0582,0.0559,0.051,0.0491,0.0423,0.0373,0.0331,0.0226,0.0159,0.0144,0.0072,0.0106,0.0139,0.0204,0.026600000000000002,0.0311,0.0351,0.0294,0.028399999999999998,0.0262,0.0273,0.0256,0.024700000000000003,0.009399999999999999,-0.004,-0.0087,-0.0097,-0.0008,0.0083,0.01,0.0107,0.0132,0.0112]

print('scipy:')
print(stats.ks_1samp(data, stats.norm.cdf))

print('statsmodels:')
print(sm.stats.diagnostic.kstest_normal(data))

Result:

scipy:
KstestResult(statistic=0.48572091653418137, pvalue=3.628993889999382e-21)
statsmodels:
(0.0954414677540868, 0.039520654276486475)

Statistics Kingdom confirms statsmodels' result is correct. But why would scipy yield a different result?

Upvotes: 0

Views: 468

Answers (1)

Josef
Josef

Reputation: 22907

These are two different tests.

scipy ks_1samp is a KS test given a fully specified distribution, i.e. no estimated parameters. In the example the Null hypothesis test is that the data comes from a standard normal distribution N(0, 1)

statsmodels kstest_normal is a KS test with estimated parameters. The Null hypothesis is that the data is normally distributed, i.e. comes from the distribution family with arbitrary mean and variance. This is also known as the Lilliefors test (alias in statsmodels). https://www.statsmodels.org/dev/generated/statsmodels.stats.diagnostic.kstest_normal.html

The asymptotic distribution of the test statistic depends on whether the parameters are estimated or not, and so results in different p-values between the two hypothesis tests.

Upvotes: 1

Related Questions