Reputation: 15136
When using hist()
in R and setting freq=FALSE
I should get a densities. However, I do not. I get other numbers than when it just shows the count. I still need to normalize.
For example:
> h = hist(c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5), freq=FALSE)
> h$density
0.13636364 0.15909091 0.09090909 0.09090909 0.02272727
> sum(h$density)
[1] 0.5
> h$density/sum(h$density)
[1] 0.27272727 0.31818182 0.18181818 0.18181818 0.0454545
Upvotes: 4
Views: 2736
Reputation: 60462
If you examine the rest of the histogram output, you will notice that the bars have length 2:
$breaks
[1] 0 2 4 6 8 10
Hence you should multiple the sum(h$density)
by 2 to get the area equal to one. You can see this clearly if you look at the histogram.
Upvotes: 7
Reputation: 206197
The density is not the same as the probability. The density for a histogram is the height of the bar. The probability is the area of the bar. You need to multiply the height times with width to get the area. Try
x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
hh <- hist(x, probability = TRUE)
sum(diff(hh$breaks) * hh$density)
# [1] 1
The works because breaks
contains the start/end points for each of the bins. So by taking the difference between each value, you get the total width of the bin. You can also with()
to more easily grab both of those values.
x <- c(1,2,1,3,1,4,5,4,5,8,2,4,1,7,6,10,7,4,3,7,3,5)
with(hist(x, probability = TRUE), sum(diff(breaks) * density))
# [1] 1
Upvotes: 1
Reputation: 14842
sum(h$density*(h$breaks[-1] - h$breaks[-length(h$breaks)]))
[1] 1
Upvotes: 1
Reputation: 500317
The area of the histogram is, in fact, 1.0
. What you're not taking into account is that every bar is two units wide:
> h$breaks
[1] 0 2 4 6 8 10
Upvotes: 1