Reputation: 588
My problem is that when I construct histograms with ggplot2 of certain bin width greater than the resolution of the data, bins sometimes contain uneven numbers of increments from the underlying data. This results in large peaks in the histogram which five a false impression of how peaky the data are. Is there a built-in way to prevent this? Maybe allocate increments between bins?
require(ggplot2)
require(ggplot2movies)
m <- ggplot(movies, aes(x = rating))
#Original resolution
plot(m + geom_histogram(binwidth = 0.1) + scale_y_sqrt())
#Downsampled
plot(m + geom_histogram(binwidth = 0.25) + scale_y_sqrt())
Upvotes: 0
Views: 577
Reputation: 588
Workaround for now is to simply modify binwidth as a function of data resolution, as opposed to number of bins.
Upvotes: 0
Reputation: 560
I don't know, if there is a built-in way or not, geom_histogram() has a default of 30 bins, which you can override. One possible soltution can be, if you count the number of different x values and use that in the number of bins (or a fraction of them):
plot(m + geom_histogram(bins = nlevels(as.factor(movies$rating))))
Upvotes: 1