Reputation: 40603
I have a vector (variable dist
) of which I want to draw a histogram with a bin-width of 7 units. Here's the assignment to dist
:
dist <- c(
# 0-6 7-13 14-20 21-27 28-34 35-41 42-48 49-55
# --- ---- ----- ----- ----- ----- ----- -----
16,
20, 29,
17, 27, 28,
19, 21, 34,
3, 14, 26, 33, 35, 44,
1, 11, 14, 21, 29, 38, 43, 55,
4, 12, 18, 22, 32, 35, 48, 50
)
In order to draw the histogram, I use hist
:
hist(dist, breaks=seq(0, 56, by=7)-0.5)
which creates this graphic:
So far, so good. There are three numbers between 0 and 6, two numbers between 7 and 13 and so forth, as is shown by the histogram.
Now, I use hist
with the prop=TRUE
parameter which creates the following graph:
Instead of a density on the y axis, I'd like it to show the probability for a bin. For example the bin with the values 21 through 27 has a height (or density) of 0.02304147, calculated as follows:
dens_21_27 <- length(dist[dist > 20.5 & dist < 27.5])/length(dist)/7
This can be verified by drawing a line with this height:
lines(c(-5, 56), c(dens_21_27, dens_21_27), col="#FF770070")
which draws
Yet, I'd like the y-axis to show the probability for a number to fall into the 21 through 27 bin, which is
length(dist[dist > 20.5 & dist < 27.5])/length(dist)
or 0.1612930
.
Is this possible somehow?
Upvotes: 2
Views: 7923
Reputation: 206606
Here's a wrapper i've used in the past to coerce the values to probabilites.
probabilityplot<-function(x, ..., prob=T, ylab="Probability") {
xx<-hist(x, yaxt="n", prob=prob, ylab=ylab , ...)
bin.sizes<-diff(xx$breaks)
if (any(bin.sizes != bin.sizes[1])) stop("bin sizes are not the same")
marks<-axTicks(2)
axis(2, at=marks, labels=marks*bin.sizes[1])
xx$probabilities <- xx$density*bin.sizes[1]
invisible(xx)
}
probabilityplot(dist,breaks=seq(0, 56, by=7)-0.5 )
Histograms were designed to estimate the density of continuous random variables hence the preference for density over probability.
Upvotes: 2
Reputation: 32466
You can bin the groups by the histogram breaks and make a barplot.
bs <- hist(dist, breaks=seq(0, 56, by=7)-0.5, plot=F)$breaks
probs <- table(cut(dist, bs)) / length(dist)
barplot(probs, ylab="Probability", las=2)
Upvotes: 1