eran
eran

Reputation: 15136

Why does the hist() function not have area one

When using hist() in R and setting freq=FALSE I should get a densities. However, I do not. I get other numbers than when it just shows the count. I still need to normalize.

For example:

> h = hist(c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5), freq=FALSE)
> h$density
  0.13636364 0.15909091 0.09090909 0.09090909 0.02272727
> sum(h$density)
  [1] 0.5
> h$density/sum(h$density)
  [1] 0.27272727 0.31818182 0.18181818 0.18181818 0.0454545

Upvotes: 4

Views: 2736

Answers (4)

csgillespie
csgillespie

Reputation: 60462

If you examine the rest of the histogram output, you will notice that the bars have length 2:

$breaks
[1]  0  2  4  6  8 10

Hence you should multiple the sum(h$density) by 2 to get the area equal to one. You can see this clearly if you look at the histogram.

enter image description here

Upvotes: 7

MrFlick
MrFlick

Reputation: 206197

The density is not the same as the probability. The density for a histogram is the height of the bar. The probability is the area of the bar. You need to multiply the height times with width to get the area. Try

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
hh <- hist(x, probability = TRUE)
sum(diff(hh$breaks) * hh$density)
# [1] 1

The works because breaks contains the start/end points for each of the bins. So by taking the difference between each value, you get the total width of the bin. You can also with() to more easily grab both of those values.

x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
with(hist(x, probability = TRUE), sum(diff(breaks) * density))
# [1] 1

Upvotes: 1

Backlin
Backlin

Reputation: 14842

sum(h$density*(h$breaks[-1] - h$breaks[-length(h$breaks)]))

[1] 1

Upvotes: 1

NPE
NPE

Reputation: 500317

The area of the histogram is, in fact, 1.0. What you're not taking into account is that every bar is two units wide:

> h$breaks
[1]  0  2  4  6  8 10

Upvotes: 1

Related Questions