Reputation: 131
I am trying to generate this plot below. Specifically, the mixture label is really nice to display that this is a mixture of gaussians.
I got as far as the second image here with the following code if anyone can help let me know how to normalize the two constituent normals (so they are below the mixture) that would be great:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
n = 10000
mu = [0, 10]
sigma = [1, 1]
samples = []
samples0 = []
samples1 = []
for i in range(n): # iteratively draw samples
Z = np.random.choice([0, 1]) # latent variable
if Z == 0:
samples0.append(np.random.normal(mu[Z], sigma[Z], 1))
else:
samples1.append(np.random.normal(mu[Z], sigma[Z], 1))
samples.append(np.random.normal(mu[Z], sigma[Z], 1))
sns.distplot(samples, hist=False, kde_kws={"shade": False})
sns.distplot(samples0, hist=False, kde_kws={"shade": True})
sns.distplot(samples1, hist=False, kde_kws={"shade": True})
plt.show()
Upvotes: 1
Views: 1868
Reputation: 16213
as you're using Gaussians you can do everything empirically pretty easily:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# define your distributions
d1 = stats.norm(2, 1.1)
d2 = stats.norm(5, 0.9)
# set mixture component weights
mc = [0.5, 0.5]
mc = mc / np.sum(mc) # ensuring they sum to 1
# where to evaluate the densities
x = np.linspace(0, 10, 501)
# calculate density and apply mixture weights
c1 = d1.pdf(x) * mc[0]
c2 = d2.pdf(x) * mc[1]
# plot everything
plt.plot(x, c1, label='Component 1')
plt.plot(x, c2, label='Component 2')
plt.plot(x, c1 + c2, '--', label='Mixture')
giving me:
which is quite close to your first plot
Upvotes: 1
Reputation: 2948
As far as I can tell, if you want to use seaborn.distplot
you can only do this by adding points outside the image range as a hack. You can however easily replicate seaborn behavior with scipy and matplotlib. Note that the distributions don't exactly fill the same space, which might be an artifact of the kernel density estimation. You can likely adjust the bandwidth manually to fix this.
Alternatively, if you already know the parameters of the distribution, you could just plot the function values instead of the kernel density estimate, which is what seems to be done in the top picture.
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
n = 10000
mu = [0, 10]
sigma = [1, 1]
samples = []
samples0 = []
samples1 = []
for i in range(n): # iteratively draw samples
Z = np.random.choice([0, 1]) # latent variable
if Z == 0:
samples0.append(np.random.normal(mu[Z], sigma[Z]))
samples.append(samples0[-1])
else:
samples1.append(np.random.normal(mu[Z], sigma[Z]))
samples.append(samples1[-1])
grid = np.linspace(min(samples)-0.5, max(samples)+0.5,1000)
y = scipy.stats.gaussian_kde(samples).evaluate(grid)
# Double the number of points to make sure the bandwidth in the KDE will be the same
y0 = scipy.stats.gaussian_kde(samples0*2).evaluate(grid)
y1 = scipy.stats.gaussian_kde(samples1*2).evaluate(grid)
# Multiply by maximum height to scale
y /= max(y)
y0 /= max(y0)
y1 /= max(y1)
plt.plot(grid, y0, label='Component 1')
plt.fill_between(grid, 0, y0, alpha=0.5)
plt.plot(grid, y1, label='Component 2')
plt.fill_between(grid, 0, y1, alpha=0.5)
plt.plot(grid, y, '--', label='Mixture')
plt.legend()
plt.show()
Upvotes: 1