Reputation: 5836
I want to plot a histogram for a normal distribution and also plot the corresponding normal distribution over it. There are several examples available online regarding normal distributions with y-axis normalized with density=True
. In my example, I am trying to form the normal distribution curve without the density type normalization. Perhaps, this could be a mathematical question implicitly but I could not figure out how to "un-normalize" the distribution curve. Following is my code:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
mu = 1e-3
std = 1.0e-4
nsize = 10000
ymax = 5000
# Generate some data for this demonstration.
data = norm.rvs(mu, std, size=nsize)
# Plot the histogram.
plt.hist(data, bins=20, color='b', edgecolor='black')
# Plot the PDF.
xmin, xmax = [0.5e-3, 1.5e-3] #plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std) # something to do with this line
plt.plot(x, p, 'k', linewidth=2)
plt.axvline(mu, linestyle='dashed', color='black')
plt.ylim([0, ymax])
This produces the following plot.
As can be seen, the area under the histogram will be equal to 10000 (nsize
) which is the number of data points. However, it is not so with the "distribution curve". How to obtain the curve match with the histogram?
Upvotes: 4
Views: 4056
Reputation: 150735
It looks like plt
returns hist
that totals to nsize
. So we can just scale p
:
# Plot the histogram.
hist, bins, _ = plt.hist(data, bins=20, color='b', edgecolor='black')
# Plot the PDF.
xmin, xmax = [0.5e-3, 1.5e-3] #plt.xlim()
# changes here
p = norm.pdf(bins, mu, std)
plt.plot(bins, p/p.sum() * nsize , 'r', linewidth=2)
Output:
Upvotes: 6