Ruslan
Ruslan

Reputation: 423

Why density histogram shows a bit weird values on y-axis?

A have a dataframe with values:

user value
1    0
2    1
3    4
4    2
5    1

When I'm trying to plot a histogram with density=True it shows pretty weird result

df.plot(kind='hist', denisty=True)

enter image description here

I know excatly that first bin covers almost 100% of the values. And density in this case should be more than 0.8. But plot shows something about 0.04.

How could that happen? Maybe I get the meaning of density wrong. By the way there are abou 800 000 values in dataframe in case it's related. Here is a describe of the dataframe:

count  795846.000000
mean  5.220350
std  20.600285
min  -3.000000
25%  0.000000
50%  0.000000
75%  1.000000
max  247.000000

Upvotes: 3

Views: 2186

Answers (2)

Andrea
Andrea

Reputation: 3077

If you are interested in probability and not probability density I think you want to use weights instead of density. Take a look at this example to see the difference:

df = pd.DataFrame({'x':np.random.normal(loc=5, scale=10, size=80000)})

fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 4))
df.plot(kind='hist', density=True, bins=np.linspace(-100, 100, 30), ax=ax0)
df.plot(kind='hist', bins=np.linspace(-100, 100, 30), weights=np.ones(len(df))/len(df), ax=ax1)

If you use density you normalize by the area of the plot, instead, if you use weights, you normalize by the sum of the heights of the bins.

enter image description here

Upvotes: 4

tidus95
tidus95

Reputation: 359

You understood the meaning of density wrong. Refer to the documentation of numpy histogram (couldn't find the exact pandas one but is the same mechanism) https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html

"Density ... If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1"

This means that the sum of the histogram areas is one, not the sum of the heights. In particular you will get the probability to be in a bin by multiplying the height by the width of the bin.

Upvotes: 3

Related Questions