Matt
Matt

Reputation: 588

R ggplot2 histogram bin allocation

My problem is that when I construct histograms with ggplot2 of certain bin width greater than the resolution of the data, bins sometimes contain uneven numbers of increments from the underlying data. This results in large peaks in the histogram which five a false impression of how peaky the data are. Is there a built-in way to prevent this? Maybe allocate increments between bins?

require(ggplot2)
require(ggplot2movies)
m <- ggplot(movies, aes(x = rating))
#Original resolution
plot(m + geom_histogram(binwidth = 0.1) + scale_y_sqrt())
#Downsampled
plot(m + geom_histogram(binwidth = 0.25) + scale_y_sqrt())

Original Resolution

Downsampled

Upvotes: 0

Views: 577

Answers (2)

Matt
Matt

Reputation: 588

Workaround for now is to simply modify binwidth as a function of data resolution, as opposed to number of bins.

Upvotes: 0

You-leee
You-leee

Reputation: 560

I don't know, if there is a built-in way or not, geom_histogram() has a default of 30 bins, which you can override. One possible soltution can be, if you count the number of different x values and use that in the number of bins (or a fraction of them):

plot(m + geom_histogram(bins = nlevels(as.factor(movies$rating))))

enter image description here

Upvotes: 1

Related Questions