Generating and plotting a mixture of gaussians with matplotlib and seaborn

Question

I am trying to generate this plot below. Specifically, the mixture label is really nice to display that this is a mixture of gaussians.

I got as far as the second image here with the following code if anyone can help let me know how to normalize the two constituent normals (so they are below the mixture) that would be great:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

n = 10000
mu = [0, 10]
sigma = [1, 1]
samples = []
samples0 = []
samples1 = []
for i in range(n):  # iteratively draw samples
    Z = np.random.choice([0, 1])  # latent variable
    if Z == 0:
        samples0.append(np.random.normal(mu[Z], sigma[Z], 1))
    else:
        samples1.append(np.random.normal(mu[Z], sigma[Z], 1))

    samples.append(np.random.normal(mu[Z], sigma[Z], 1))
sns.distplot(samples, hist=False, kde_kws={"shade": False})
sns.distplot(samples0, hist=False, kde_kws={"shade": True})
sns.distplot(samples1, hist=False, kde_kws={"shade": True})
plt.show()

user2653663 · Accepted Answer

As far as I can tell, if you want to use seaborn.distplot you can only do this by adding points outside the image range as a hack. You can however easily replicate seaborn behavior with scipy and matplotlib. Note that the distributions don't exactly fill the same space, which might be an artifact of the kernel density estimation. You can likely adjust the bandwidth manually to fix this.

Alternatively, if you already know the parameters of the distribution, you could just plot the function values instead of the kernel density estimate, which is what seems to be done in the top picture.

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

n = 10000
mu = [0, 10]
sigma = [1, 1]
samples = []
samples0 = []
samples1 = []
for i in range(n):  # iteratively draw samples
    Z = np.random.choice([0, 1])  # latent variable
    if Z == 0:
        samples0.append(np.random.normal(mu[Z], sigma[Z]))
        samples.append(samples0[-1])
    else:
        samples1.append(np.random.normal(mu[Z], sigma[Z]))
        samples.append(samples1[-1])

grid = np.linspace(min(samples)-0.5, max(samples)+0.5,1000)
y = scipy.stats.gaussian_kde(samples).evaluate(grid)
# Double the number of points to make sure the bandwidth in the KDE will be the same
y0 = scipy.stats.gaussian_kde(samples0*2).evaluate(grid)
y1 = scipy.stats.gaussian_kde(samples1*2).evaluate(grid)
# Multiply by maximum height to scale
y /= max(y)
y0 /= max(y0)
y1 /= max(y1)
plt.plot(grid, y0, label='Component 1')
plt.fill_between(grid, 0, y0, alpha=0.5)
plt.plot(grid, y1, label='Component 2')
plt.fill_between(grid, 0, y1, alpha=0.5)
plt.plot(grid, y, '--', label='Mixture')
plt.legend()
plt.show()

Generating and plotting a mixture of gaussians with matplotlib and seaborn

Answers (2)

Related Questions