Reputation: 11
data = np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=1000)
cdfx = multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]]).cdf
ks_1samp(x=data, cdf=cdfx)
KstestResult(statistic=0.9930935227267083, pvalue=0.0)
Shouldn't the P-value be high?
Upvotes: 0
Views: 386
Reputation: 114946
The Kolmogorov-Smirnov test is for univariate distributions. See the section "The Kolmogorov–Smirnov statistic in more than one dimension" for a discussion of a multivariate generalization.
ks_1samp
expects the input x
to be one-dimensional, and it expects the cdf
function to be the CDF of a univariate distribution. It does not validate these properties, so the behavior is undefined (and, clearly, nonsense) if the expectations are not met.
With the univariate normal distribution, it works as you expect:
In [20]: from scipy.stats import ks_1samp, norm
In [21]: x = norm.rvs(size=1000)
In [22]: ks_1samp(x, norm.cdf)
Out[22]: KstestResult(statistic=0.025983100250768443, pvalue=0.5011047711453744)
Upvotes: 2