Camilla
Camilla

Reputation: 113

density/frequency and probability in hist()

I have used the code

hist(x, probability=TRUE)

which gives me a y-axis from 0 to 2 with the name density. I dont get what this means. Does it integrate to 1, sum to 1, or what is the y-value equal to? The documentation says "freq = NULL, probability = !freq" but that does not make sense to me. If I dont use probability=TRUE I get Frequency on the y-axis, but the shape of the plot is the same.

Upvotes: 1

Views: 5704

Answers (1)

Maksim Gayduk
Maksim Gayduk

Reputation: 1082

You can save your histogram to a variable and take a look at it.

x=rnorm(1000)
h<-hist(x)

enter image description here

h

$breaks
 [1] -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0

$counts
 [1]   2   8  24  42  87 169 188 189 146  78  38  23   5   0   1

$density
 [1] 0.004 0.016 0.048 0.084 0.174 0.338 0.376 0.378 0.292 0.156 0.076 0.046 0.010 0.000 0.002

$mids
 [1] -3.25 -2.75 -2.25 -1.75 -1.25 -0.75 -0.25  0.25  0.75  1.25  1.75  2.25  2.75  3.25  3.75

$xname
[1] "x"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

By default it plots frequency (can be accessed via h$counts), which is just the number of points that get within each interval. Total amount of points is equal to the length of the vector, which you can check with

sum(h$counts)
[1] 1000

If you specify probability=TRUE, it will plot the probability of each point getting within each interval. Total sum of probabilities times the width of the bar should be equal to 1. In our case, bar width is 0.5, so we get

sum(h$density*0.5)
[1] 1

Upvotes: 2

Related Questions