Reputation: 2134
Consider a sum of n
squared iid normal random variables S = sum (Z^2(mu, sig^2))
. According to this question, S / sig^2
has a noncentral chi-squared distribution with degrees of freedom = n
and non-centrality parameter = n*mu^2
.
However, compare generating N
of these variables S
by summing squared normals with generating N
noncentral chi-squared random variables directly using scipy.ncx2
:
import numpy as np
from scipy.stats import ncx2, chi2
import matplotlib.pyplot as plt
n = 1000 # number of normals in sum
N_MC = 100000 # number of trials
mu = 0.05
sig = 0.3
### Generate sums of squared normals ###
Z = np.random.normal(loc=mu, scale=sig, size=(N_MC, n))
S = np.sum(Z**2, axis=1)
### Generate non-central chi2 RVs directly ###
dof = n
non_centrality = n*mu**2
NCX2 = sig**2 * ncx2.rvs(dof, non_centrality, size=N_MC)
# NCX2 = sig**2 * chi2.rvs(dof, size=N_MC) # for mu = 0.0
### Plot histos ###
fig, ax = plt.subplots()
ax.hist(S, bins=50, label='S')
ax.hist(NCX2, bins=50, label='NCX2', alpha=0.7)
ax.legend()
plt.show()
This results in the histograms
I believe the mathematics is correct; could the discrepancy be a bug in the ncx2
implementation? Setting mu = 0
and using scipy.chi2
looks much better:
Upvotes: 4
Views: 787
Reputation: 114946
The problem is in the second sentence of the question: "S / sig^2
has a noncentral chi-squared distribution with degrees of freedom = n
and non-centrality parameter = n*mu^2
." That non-centrality parameter is not correct. It should be n*(mu/sig)^2
.
The standard definition of the noncentral chi-squared distribution is that it is the sum of the squares of normal variates that have mean mu and standard deviation 1. You are computing S
using normal variates with standard deviation sig
. Let's write that distribution as N(mu, sig**2)
. By using the location-scale properties of the normal distribution, we have
N(mu, sig**2) = mu + sig*N(0, 1) = sig*(mu/sig + N(0,1)) = sig*N(mu/sig, 1)
So summing the squares of variates from N(mu, sig**2)
is equivalent to summing the squares of sig*N(mu/sig, 1)
. That gives sig**2
times a noncentral chi-squared variate with noncentrality mu/sig
.
If you change the line where non_centrality
is computed to
non_centrality = n*(mu/sig)**2
the histograms line up as you expect.
Upvotes: 3