Martin Schmelzer
Martin Schmelzer

Reputation: 23919

Histogram does not show densities

I've tried different functions and several different arguments but the histogram

hist(estimator, probability=T, br=5)

isn't showing densities on the y-axis no matter what plotting function or argument I choose. The vector "estimator" contains 100 values around between 0.4 and 0.6.

To be precise: the way by creating an hist object, then calculate and change the densities and plot it again by plot() works, but I don't want the hist object to be plotted in the first place :/ Plot

Upvotes: 2

Views: 7434

Answers (1)

Greg Snow
Greg Snow

Reputation: 49670

When you specify probability=T (or better yet probability=TRUE so that you don't get messed up if T is changed to something besides TRUE) is a scaling such that the entire area of the histogram bars add to 1, since the width of your bars is quite a bit less than 1 the heights need to be greater than 1 so that the areas all add to 1. This makes it easy to superpose a density estimate curve or a theoretical density curve or add other references.

In general you should just ignore the tick labels on the y-axis (it would be better if they were not even plotted), they just distract from the important parts of the plot.

Many people think they want the y-axis tick labels to represent the proportion (or percentage) of observations within each grouping (and that is possible with your own custom axis), but I think this is still a distraction. Consider what happens if you change the number of bars/intervals in the histogram, the overall structure of the histogram stays the same (provided you don't make to drastic a change), but the tick labels on the y-axis change, sometimes by quite a bit, so they are better ignored (or not produced in the first place).

If you really think that the percentages (or proportions) are needed then the code is as simple as:

x <- rgamma(327, 5, 3)

tmp <- hist(x, yaxt='n',ylab='Percent')
tmp2 <- pretty( tmp$counts/sum(tmp$counts)*100 )
axis(2, at=tmp2*sum(tmp$counts)/100, labels=tmp2)

That could be easily wrapped into a function if you wanted.

Upvotes: 8

Related Questions