canonball
canonball

Reputation: 515

sampling from a clipped normal distribution

How do I sample from a normal distribution that is clipped?

I want to sample from N(0, 1). But I want the values to be from [-1, +1]. I cannot apply np.clip as that would increase the probability of -1 and +1. I can do stochastic clipping, but then there's no guarantee that it'll fall out of the range.

#standard
s = np.random.normal(0, 1, [10,10])
s = np.clip(s)

#stochastic
for j in range(10)
    edge1 = np.where(s[j] >= 1.)[0]
    edge2 = np.where(s[j] <= -1)[0]

    if edge1.shape[0] > 0:
        rand_el1 = np.random.normal(0, 1, size=(1, edge1.shape[0])) 
        s[j,edge1] = rand_el1
    if edge2.shape[0] > 0:
        rand_el2 = np.random.normal(0, 1, size=(1, edge2.shape[0]))                            
        s[j,edge2] = rand_el2

Upvotes: 1

Views: 2051

Answers (3)

Warren Weckesser
Warren Weckesser

Reputation: 114976

The scipy library implements the truncated normal distribution as scipy.stats.truncnorm. In your case, you can use sample = truncnorm.rvs(-1, 1, size=sample_size).

For example,

In [55]: import matplotlib.pyplot as plt

In [56]: from scipy.stats import truncnorm, norm

Sample 100000 points from the normal distribution truncated to [-1, 1]:

In [57]: sample = truncnorm.rvs(-1, 1, size=100000)

Make a histogram, and plot the theoretical PDF curve. The PDF can be computed with truncnorm.pdf, or with a scaled version of norm.pdf.

In [58]: _ = plt.hist(sample, bins=51, normed=True, facecolor='g', edgecolor='k', alpha=0.4)

In [59]: x = np.linspace(-1, 1, 101)

In [60]: plt.plot(x, truncnorm.pdf(x, -1, 1), 'k', alpha=0.4, linewidth=5)
Out[60]: [<matplotlib.lines.Line2D at 0x11f78c160>]

In [61]: plt.plot(x, norm.pdf(x)/(norm.cdf(1) - norm.cdf(-1)), 'k--', linewidth=1)
Out[61]: [<matplotlib.lines.Line2D at 0x11f779f60>]

Here's the plot:

plot

Upvotes: 2

Archeo
Archeo

Reputation: 221

I believe that the simplest (perhaps not most efficient) way to do so is to use basic Rejection sampling. It simply consists in simulating values from a N(0,1), rejecting those that fall out of the wanted bounds and keeping the others until they stack to the wanted number of samples.

kept = []
while len(kept) < 1000:
    s = np.random.normal()
    if -1 <= s <= 1:
        kept.append(s)

Here I stack things in a basic list ; feel free to use a np.array and replace the length condition with one based on the array's dimensions.

Upvotes: 2

Mad Physicist
Mad Physicist

Reputation: 114578

Do the stochiastic clipping iteratively until you don't need it any more. This basically means turning your ifs into a while. You can also take this opportunity to simplify the out-of bounds condition into a single check rather than a separate check on each side:

s = np.random.normal(0, 1, (10, 10))
while True:
    out_of_bounds = np.abs(s) > 1
    count = np.count_nonzero(out_of_bounds)
    if count:
        s[out_of_bounds] = np.random.normal(0, 1, count)
    else:
        break

Upvotes: 1

Related Questions