Reputation: 147
I am trying to reproduce the simulation results on Razali & Wah's 2011 article "Power comparisons of Shapiro-Wilk, Kolmogorov-Smirov, Lilliefors, and Anderson-Darling tests" which appeared in the Journal of Statistical Modeling and Analytics. My question is that how to get the following picture in R (or Python) only for Kolmogorov-Smirnov test and Lilliefors test.
Our hypotheses test question is that given $X_1,\dots, X_n \sim$ unknown CDF $F$. Let $F_n$ be the empirical CDF. We want to test $$H_0: F=F_0, v.s., F\neq F_0$$ where $F_0$ is the normal distribution $N(0,1)$. Our test statistic as $$ T=\sup_{x\in R} |F_n-F_0(x)|$$
If $T>1-\alpha$ quantile, then we reject $H_0$ at level of significance $\alpha$.
In our Monte Carlo simulation, we take $\alpha=5%$, sample size is 100, $F$ is $Beta(2,2)$.
This paper says that they use "scipy.stats.ksone
".
Upvotes: 1
Views: 275
Reputation: 3873
This can be done in Python something like this.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
rng = np.random.default_rng(7489746759637963126)
alpha = 0.05 # threshhold for significance
dist = stats.beta(2, 2) # power is against Beta(2, 2)
n_replications = 100 # power simulated with 100 replications
n_points = 15 # 15 different sample sizes
ns = np.logspace(np.log10(10), np.log10(3000), n_points).astype(int)
# Allocate arrays to store p-value
pvalues_lilliefors = np.zeros((n_points, n_replications))
pvalues_ks = np.zeros((n_points, n_replications))
pvalues_shapiro = np.zeros((n_points, n_replications))
for i, n in enumerate(ns):
rvs = dist.rvs(size=n, random_state=rng)
# perform Lillifors test once just to get null distribution of test statistic
res = stats.goodness_of_fit(stats.norm, rvs, statistic='ks', random_state=rng)
null_distribution = res.null_distribution
for j in range(n_replications):
rvs = dist.rvs(size=n, random_state=rng)
# use goodness_of_fit just to evaluate Lilliefors statistic
res = stats.goodness_of_fit(stats.norm, rvs, statistic='ks',
random_state=rng, n_mc_samples=1)
# get p-value of Lilliefors test based on null distribution
p = ((np.sum(null_distribution >= res.statistic) + 1)
/ (len(null_distribution)+ 1))
pvalues_lilliefors[i, j] = p
# Compute KS p-value
# (assuming null hypothesized distribution is normal fitted to data)
loc = np.mean(rvs)
scale = np.std(rvs)
p = stats.kstest(rvs, stats.norm(loc=loc, scale=scale).cdf).pvalue
pvalues_ks[i, j] = p
# Compute Shapiro-Wilk p-value
p = stats.shapiro(rvs).pvalue
pvalues_shapiro[i, j] = p
# percentage of replications in which H0 was rejected
rejections_lilliefors = (pvalues_lilliefors < alpha).mean(axis=1)
rejections_ks = (pvalues_ks < alpha).mean(axis=1)
rejections_shapiro = (pvalues_shapiro < alpha).mean(axis=1)
# plot the results
plt.semilogx(ns, rejections_shapiro, label='SW')
plt.semilogx(ns, rejections_lilliefors, label='Lilliefors')
plt.semilogx(ns, rejections_ks, label='KS')
plt.legend()
plt.xlabel('Sample size `n`')
plt.ylabel('Simulated Power')
plt.title('Simulated Power of Normality Tests for Beta(2, 2), 100 replications')
A few notes:
scipy.stats.goodness_of_fit
with 9999 Monte Carlo sample, the default; you may wish to use more.Upvotes: 0