Why density histogram shows a bit weird values on y-axis?

Question

A have a dataframe with values:

When I'm trying to plot a histogram with density=True it shows pretty weird result

df.plot(kind='hist', denisty=True)

I know excatly that first bin covers almost 100% of the values. And density in this case should be more than 0.8. But plot shows something about 0.04.

How could that happen? Maybe I get the meaning of density wrong. By the way there are abou 800 000 values in dataframe in case it's related. Here is a describe of the dataframe:

count  795846.000000
mean  5.220350
std  20.600285
min  -3.000000
25%  0.000000
50%  0.000000
75%  1.000000
max  247.000000

Andrea · Accepted Answer

If you are interested in probability and not probability density I think you want to use weights instead of density. Take a look at this example to see the difference:

df = pd.DataFrame({'x':np.random.normal(loc=5, scale=10, size=80000)})

fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 4))
df.plot(kind='hist', density=True, bins=np.linspace(-100, 100, 30), ax=ax0)
df.plot(kind='hist', bins=np.linspace(-100, 100, 30), weights=np.ones(len(df))/len(df), ax=ax1)

If you use density you normalize by the area of the plot, instead, if you use weights, you normalize by the sum of the heights of the bins.

Why density histogram shows a bit weird values on y-axis?

Answers (2)

Related Questions