René Nyffenegger
René Nyffenegger

Reputation: 40603

How do I create a histogram with a probability y-axis rather than a density y-axis?

I have a vector (variable dist) of which I want to draw a histogram with a bin-width of 7 units. Here's the assignment to dist:

dist <- c(
#  0-6  7-13  14-20  21-27  28-34  35-41  42-48  49-55
#  ---  ----  -----  -----  -----  -----  -----  -----
                 16,
                 20,           29,
                 17,    27,    28,
                 19,    21,    34,
     3,          14,    26,    33,    35,    44,
     1,   11,    14,    21,    29,    38,    43,    55,
     4,   12,    18,    22,    32,    35,    48,    50
)

In order to draw the histogram, I use hist:

hist(dist, breaks=seq(0, 56, by=7)-0.5)

which creates this graphic:

enter image description here

So far, so good. There are three numbers between 0 and 6, two numbers between 7 and 13 and so forth, as is shown by the histogram.

Now, I use hist with the prop=TRUE parameter which creates the following graph:

enter image description here

Instead of a density on the y axis, I'd like it to show the probability for a bin. For example the bin with the values 21 through 27 has a height (or density) of 0.02304147, calculated as follows:

dens_21_27 <- length(dist[dist > 20.5 & dist < 27.5])/length(dist)/7

This can be verified by drawing a line with this height:

lines(c(-5, 56), c(dens_21_27, dens_21_27), col="#FF770070")

which draws

enter image description here

Yet, I'd like the y-axis to show the probability for a number to fall into the 21 through 27 bin, which is

length(dist[dist > 20.5 & dist < 27.5])/length(dist)

or 0.1612930.

Is this possible somehow?

Upvotes: 2

Views: 7923

Answers (2)

MrFlick
MrFlick

Reputation: 206606

Here's a wrapper i've used in the past to coerce the values to probabilites.

probabilityplot<-function(x, ..., prob=T, ylab="Probability") {
    xx<-hist(x, yaxt="n", prob=prob, ylab=ylab , ...)
    bin.sizes<-diff(xx$breaks)
    if (any(bin.sizes != bin.sizes[1])) stop("bin sizes are not the same")
    marks<-axTicks(2)
    axis(2, at=marks, labels=marks*bin.sizes[1])
    xx$probabilities <- xx$density*bin.sizes[1]
    invisible(xx)
}

probabilityplot(dist,breaks=seq(0, 56, by=7)-0.5 )

enter image description here

Histograms were designed to estimate the density of continuous random variables hence the preference for density over probability.

Upvotes: 2

Rorschach
Rorschach

Reputation: 32466

You can bin the groups by the histogram breaks and make a barplot.

bs <- hist(dist, breaks=seq(0, 56, by=7)-0.5, plot=F)$breaks
probs <- table(cut(dist, bs)) / length(dist)
barplot(probs, ylab="Probability", las=2)

enter image description here

Upvotes: 1

Related Questions