Javide
Javide

Reputation: 2637

Scipy kstest returns different p-values for similar sets of values

In Python 3.6.5 and scipy 1.1.0, when I run a Kolmogorov-Smirnov test to check a uniform distribution, I obtain two opposite results (from a p-value perspective) if I feed the kstest function with a row or column vector:

from scipy import stats
import numpy as np

>>> np.random.seed(seed=123)
>>> stats.kstest(np.random.uniform(low=0, high=1, size=(10000, 1)), 'uniform')

KstestResult(statistic=0.9999321616877249, pvalue=0.0)

>>> np.random.seed(seed=123)
>>> stats.kstest(np.random.uniform(low=0, high=1, size=(1, 10000)), 'uniform')

KstestResult(statistic=0.9999321616877249, pvalue=0.00013567662455016283)

Do you know why this would be the case?

Upvotes: 1

Views: 476

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114781

It is mentioned in the docstring of kstest that when the first argument to kstest is an array, it is expected to be a one-dimensional array. In your examples, you are passing two-dimensional arrays (where one of the dimensions is trivial in each case). It turns out that the code in kstest will not do what you expect when the input array is two-dimensional.

The easy fix is to flatten the array before passing it to kstest. The ravel() method can be used to do that. For example,

In [50]: np.random.seed(seed=123)

In [51]: x = np.random.uniform(low=0, high=1, size=(10000, 1))

In [52]: stats.kstest(x.ravel(), 'uniform')
Out[52]: KstestResult(statistic=0.008002577626569918, pvalue=0.5437230826096209)

In [53]: np.random.seed(seed=123)

In [54]: x = np.random.uniform(low=0, high=1, size=(1, 10000))

In [55]: stats.kstest(x.ravel(), 'uniform')
Out[55]: KstestResult(statistic=0.008002577626569918, pvalue=0.5437230826096209)

Upvotes: 2

Related Questions