Tomislav
Tomislav

Reputation: 131

Generating and plotting a mixture of gaussians with matplotlib and seaborn

I am trying to generate this plot below. Specifically, the mixture label is really nice to display that this is a mixture of gaussians.

enter image description here

I got as far as the second image here with the following code if anyone can help let me know how to normalize the two constituent normals (so they are below the mixture) that would be great:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

n = 10000
mu = [0, 10]
sigma = [1, 1]
samples = []
samples0 = []
samples1 = []
for i in range(n):  # iteratively draw samples
    Z = np.random.choice([0, 1])  # latent variable
    if Z == 0:
        samples0.append(np.random.normal(mu[Z], sigma[Z], 1))
    else:
        samples1.append(np.random.normal(mu[Z], sigma[Z], 1))

    samples.append(np.random.normal(mu[Z], sigma[Z], 1))
sns.distplot(samples, hist=False, kde_kws={"shade": False})
sns.distplot(samples0, hist=False, kde_kws={"shade": True})
sns.distplot(samples1, hist=False, kde_kws={"shade": True})
plt.show()

t

Upvotes: 1

Views: 1868

Answers (2)

Sam Mason
Sam Mason

Reputation: 16213

as you're using Gaussians you can do everything empirically pretty easily:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# define your distributions
d1 = stats.norm(2, 1.1)
d2 = stats.norm(5, 0.9)

# set mixture component weights
mc = [0.5, 0.5]
mc = mc / np.sum(mc) # ensuring they sum to 1

# where to evaluate the densities
x = np.linspace(0, 10, 501)
# calculate density and apply mixture weights
c1 = d1.pdf(x) * mc[0]
c2 = d2.pdf(x) * mc[1]

# plot everything
plt.plot(x, c1, label='Component 1')
plt.plot(x, c2, label='Component 2')
plt.plot(x, c1 + c2, '--', label='Mixture')

giving me:

GMM

which is quite close to your first plot

Upvotes: 1

user2653663
user2653663

Reputation: 2948

As far as I can tell, if you want to use seaborn.distplot you can only do this by adding points outside the image range as a hack. You can however easily replicate seaborn behavior with scipy and matplotlib. Note that the distributions don't exactly fill the same space, which might be an artifact of the kernel density estimation. You can likely adjust the bandwidth manually to fix this.

Alternatively, if you already know the parameters of the distribution, you could just plot the function values instead of the kernel density estimate, which is what seems to be done in the top picture.

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

n = 10000
mu = [0, 10]
sigma = [1, 1]
samples = []
samples0 = []
samples1 = []
for i in range(n):  # iteratively draw samples
    Z = np.random.choice([0, 1])  # latent variable
    if Z == 0:
        samples0.append(np.random.normal(mu[Z], sigma[Z]))
        samples.append(samples0[-1])
    else:
        samples1.append(np.random.normal(mu[Z], sigma[Z]))
        samples.append(samples1[-1])

grid = np.linspace(min(samples)-0.5, max(samples)+0.5,1000)
y = scipy.stats.gaussian_kde(samples).evaluate(grid)
# Double the number of points to make sure the bandwidth in the KDE will be the same
y0 = scipy.stats.gaussian_kde(samples0*2).evaluate(grid)
y1 = scipy.stats.gaussian_kde(samples1*2).evaluate(grid)
# Multiply by maximum height to scale
y /= max(y)
y0 /= max(y0)
y1 /= max(y1)
plt.plot(grid, y0, label='Component 1')
plt.fill_between(grid, 0, y0, alpha=0.5)
plt.plot(grid, y1, label='Component 2')
plt.fill_between(grid, 0, y1, alpha=0.5)
plt.plot(grid, y, '--', label='Mixture')
plt.legend()
plt.show()

Upvotes: 1

Related Questions