Reputation: 2637
In Python 3.6.5 and scipy 1.1.0, when I run a Kolmogorov-Smirnov test to check a uniform distribution, I obtain two opposite results (from a p-value perspective) if I feed the kstest function with a row or column vector:
from scipy import stats
import numpy as np
>>> np.random.seed(seed=123)
>>> stats.kstest(np.random.uniform(low=0, high=1, size=(10000, 1)), 'uniform')
KstestResult(statistic=0.9999321616877249, pvalue=0.0)
>>> np.random.seed(seed=123)
>>> stats.kstest(np.random.uniform(low=0, high=1, size=(1, 10000)), 'uniform')
KstestResult(statistic=0.9999321616877249, pvalue=0.00013567662455016283)
Do you know why this would be the case?
Upvotes: 1
Views: 476
Reputation: 114781
It is mentioned in the docstring of kstest
that when the first argument to kstest
is an array, it is expected to be a one-dimensional array. In your examples, you are passing two-dimensional arrays (where one of the dimensions is trivial in each case). It turns out that the code in kstest
will not do what you expect when the input array is two-dimensional.
The easy fix is to flatten the array before passing it to kstest
. The ravel()
method can be used to do that. For example,
In [50]: np.random.seed(seed=123)
In [51]: x = np.random.uniform(low=0, high=1, size=(10000, 1))
In [52]: stats.kstest(x.ravel(), 'uniform')
Out[52]: KstestResult(statistic=0.008002577626569918, pvalue=0.5437230826096209)
In [53]: np.random.seed(seed=123)
In [54]: x = np.random.uniform(low=0, high=1, size=(1, 10000))
In [55]: stats.kstest(x.ravel(), 'uniform')
Out[55]: KstestResult(statistic=0.008002577626569918, pvalue=0.5437230826096209)
Upvotes: 2