How to implement a KS-Test in Python

Question

scipy.stats.kstest(rvs, cdf, N) can perform a KS-Test on a dataset rvs. It tests if the dataset follows a propability distribution, whose cdf is specified in the parameters of this method.

Consider now a dataset of N=4800 samples. I have performed a KDE on this data and, therefore, have an estimated PDF. This PDF looks an awful lot like a bimodal distribution. When plotting the estimated PDF and curve_fitting a bimodal distribution to it, these two plots are pretty much identical. The parameters of the fitted bimodal distribution are (scale1, mean1, stdv1, scale2, mean2, stdv2): [0.6 0.036 0.52, 0.23 1.25 0.4]

How can I apply scipy.stats.kstest to test if my estimated PDF is bimodal distributed? As my null hypothesis, I state that the estimated PDF equals the following PDF:

hypoDist = 0.6*norm(loc=0, scale=0.2).pdf(x_grid) + 0.3*norm(loc=1, scale=0.2).pdf(x_grid)
hypoCdf = np.cumsum(hypoDist)/len(x_grid)

x_grid is just a vector that contains the x-values at which I evaluate my estimated PDF. So each entry of pdf has a corresponding value of x_grid. It might be that my computation of hypoCdf is incorrect. Maybe instead of dividing by len(x_grid), should I divide by np.sum(hypoDist) ?

Challenge: cdf parameter of kstest cannot be specified as bimodal. Neither can I specify it to be hypoDist.

If I wanted to test whether my dataset was Gaussian distributed, I would write:

KS_result = kstest(measurementError, norm(loc=mean(pdf), scale=np.std(pdf)).cdf)
print(KS_result)

measurementError is the dataset that I have performed the KDE on. This returns: statistic=0.459, pvalue=0.0 To me, it is a little irritating that the pvalue is 0.0

How to implement a KS-Test in Python

Answers (1)

Related Questions