Reputation: 25
My code:
from scipy import stats
import statsmodels.api as sm
data=[-0.032400000000000005,-0.0358,-0.035699999999999996,-0.029500000000000002,-0.0227,-0.0146,-0.0125,-0.0103,-0.0182,-0.0137,-0.021099999999999997,-0.0327,-0.0279,-0.0325,-0.0252,-0.015700000000000002,-0.0148,-0.013999999999999999,-0.0137,-0.013500000000000002,-0.0042,0.0044,0.0212,0.027999999999999997,0.036699999999999997,0.0447,0.0524,0.056100000000000004,0.0519,0.0571,0.0424,0.045899999999999996,0.0496,0.053,0.0594,0.0712,0.0949,0.09050000000000001,0.0907,0.0616,0.0235,0.011000000000000001,-0.0103,0.0075,0.018799999999999997,0.0268,0.0383,0.0392,0.0546,0.0565,0.06509999999999999,0.0681,0.0622,0.061900000000000004,0.056900000000000006,0.0583,0.0495,0.053099999999999994,0.0612,0.0572,0.0636,0.0599,0.0582,0.0559,0.051,0.0491,0.0423,0.0373,0.0331,0.0226,0.0159,0.0144,0.0072,0.0106,0.0139,0.0204,0.026600000000000002,0.0311,0.0351,0.0294,0.028399999999999998,0.0262,0.0273,0.0256,0.024700000000000003,0.009399999999999999,-0.004,-0.0087,-0.0097,-0.0008,0.0083,0.01,0.0107,0.0132,0.0112]
print('scipy:')
print(stats.ks_1samp(data, stats.norm.cdf))
print('statsmodels:')
print(sm.stats.diagnostic.kstest_normal(data))
Result:
scipy:
KstestResult(statistic=0.48572091653418137, pvalue=3.628993889999382e-21)
statsmodels:
(0.0954414677540868, 0.039520654276486475)
Statistics Kingdom confirms statsmodels' result is correct. But why would scipy yield a different result?
Upvotes: 0
Views: 468
Reputation: 22907
These are two different tests.
scipy ks_1samp
is a KS test given a fully specified distribution, i.e. no estimated parameters. In the example the Null hypothesis test is that the data comes from a standard normal distribution N(0, 1)
statsmodels kstest_normal
is a KS test with estimated parameters.
The Null hypothesis is that the data is normally distributed, i.e. comes from the distribution family with arbitrary mean and variance. This is also known as the Lilliefors test (alias in statsmodels).
https://www.statsmodels.org/dev/generated/statsmodels.stats.diagnostic.kstest_normal.html
The asymptotic distribution of the test statistic depends on whether the parameters are estimated or not, and so results in different p-values between the two hypothesis tests.
Upvotes: 1