Plot a density function above a histogram

In Python, I have estimated the parameters for the density of a model of my distribution and I would like to plot the density function above the histogram of the distribution. In R it is similar to using the option prop=TRUE.

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# initialization of the list "data"
# estimation of the parameter, in my case, mean and variance of a normal distribution

plt.hist(data, bins="auto") # data is the list of data
# here I would like to draw the density above the histogram
plt.show()

I guess the trickiest part is to make it fit.

Edit: I have tried this according to the first answer:

mean = np.mean(logdata)
var  = np.var(logdata)
std  = np.sqrt(var) # standard deviation, used by numpy as a replacement of the variance
plt.hist(logdata, bins="auto", alpha=0.5, label="données empiriques")
x = np.linspace(min(logdata), max(logdata), 100)
plt.plot(x, mlab.normpdf(x, mean, std))
plt.xlabel("log(taille des fichiers)")
plt.ylabel("nombre de fichiers")
plt.legend(loc='upper right')
plt.grid(True)
plt.show()

But it doesn't fit the graph, here is how it looks: What I get with the python code above. I would like the density to fit the histogram but the values are too small.

** Edit 2 ** Works with the option normed=True in the histogram function. Something that would look like this (done in R with the optionprob=TRUE)

Upvotes: 1

Views: 8676

Answers (2)

Vasiliy Kirstia
Vasiliy Kirstia

Reputation: 43

All what you are looking for, already are in seaborn.

You just have to use distplot

import seaborn as sns
import numpy as np

data = np.random.normal(5, 2, size=1000)
sns.distplot(data)

plot is here

Upvotes: 0

DavidG
DavidG

Reputation: 25400

If I understand you correctly you have the mean and standard deviation of some data. You have plotted a histogram of this and would like to plot the normal distribution line over the histogram. This line can be generated using matplotlib.mlab.normpdf(), the documentation can be found here.

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

mean = 100
sigma = 5

data = np.random.normal(mean,sigma,1000) # generate fake data
x = np.linspace(min(data), max(data), 100)

plt.hist(data, bins="auto",normed=True)
plt.plot(x, mlab.normpdf(x, mean, sigma))

plt.show()

Which gives the following figure:

enter image description here

Edit: The above only works with normed = True. If this is not an option, we can define our own function:

def gauss_function(x, a, x0, sigma):
    return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

mean = 100
sigma = 5

data = np.random.normal(mean,sigma,1000) # generate fake data
x = np.linspace(min(data), max(data), 1000)

test = gauss_function(x, max(data), mean, sigma)

plt.hist(data, bins="auto")
plt.plot(x, test)

plt.show()

Upvotes: 2

Related Questions