bcf
bcf

Reputation: 2134

Scipy Non-central Chi-Squared Random Variable

Consider a sum of n squared iid normal random variables S = sum (Z^2(mu, sig^2)). According to this question, S / sig^2 has a noncentral chi-squared distribution with degrees of freedom = n and non-centrality parameter = n*mu^2.

However, compare generating N of these variables S by summing squared normals with generating N noncentral chi-squared random variables directly using scipy.ncx2:

import numpy as np
from scipy.stats import ncx2, chi2
import matplotlib.pyplot as plt

n = 1000  # number of normals in sum
N_MC = 100000  # number of trials

mu = 0.05
sig = 0.3

### Generate sums of squared normals ###
Z = np.random.normal(loc=mu, scale=sig, size=(N_MC, n))
S = np.sum(Z**2, axis=1)

### Generate non-central chi2 RVs directly ###
dof = n
non_centrality = n*mu**2
NCX2 = sig**2 * ncx2.rvs(dof, non_centrality, size=N_MC)
# NCX2 = sig**2 * chi2.rvs(dof, size=N_MC)  # for mu = 0.0

### Plot histos ###
fig, ax = plt.subplots()
ax.hist(S, bins=50, label='S')
ax.hist(NCX2, bins=50, label='NCX2', alpha=0.7)
ax.legend()
plt.show()

This results in the histograms comparison of distros

I believe the mathematics is correct; could the discrepancy be a bug in the ncx2 implementation? Setting mu = 0 and using scipy.chi2 looks much better: distros good

Upvotes: 4

Views: 787

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114946

The problem is in the second sentence of the question: "S / sig^2 has a noncentral chi-squared distribution with degrees of freedom = n and non-centrality parameter = n*mu^2." That non-centrality parameter is not correct. It should be n*(mu/sig)^2.

The standard definition of the noncentral chi-squared distribution is that it is the sum of the squares of normal variates that have mean mu and standard deviation 1. You are computing S using normal variates with standard deviation sig. Let's write that distribution as N(mu, sig**2). By using the location-scale properties of the normal distribution, we have

N(mu, sig**2) = mu + sig*N(0, 1) = sig*(mu/sig + N(0,1)) = sig*N(mu/sig, 1)

So summing the squares of variates from N(mu, sig**2) is equivalent to summing the squares of sig*N(mu/sig, 1). That gives sig**2 times a noncentral chi-squared variate with noncentrality mu/sig.

If you change the line where non_centrality is computed to

non_centrality = n*(mu/sig)**2

the histograms line up as you expect.

Upvotes: 3

Related Questions