How to show outliers in a ggplot histogram?

Question

I have some data where most of the values (about 10 million in the real data) are close to zero but there are a few outliers. I want to show the distribution with a histogram. For the content and analysis of the data, the outliers are important as well and hence should be visible in the histogram. Using a logarithmic scale on the y-axis works quite well but there remains a problem. The y-axis now starts at 1. So bins with exactly one element in them are not drawn and cannot be distinguished from empty bins. Additionally, I get a warning message about infinite values for the empty bin (which is correct, $log(0)=-\infty$).

I made a little code example:

library(ggplot2)
set.seed(123)

data <- data.frame(x=c(abs(rnorm(10000)), 5.25, 5.5, 7.5))

ggplot(data, aes(x)) + 
    geom_histogram(binwidth=1, boundary=0) + 
    scale_y_log10()

The two outliers between 5 and 6 are well shown but the one at 7.5 cannot be distinguished from the two empty bins. How do I tell ggplot to start drawing the bins from a y-value smaller than 1?

PS: stackoverflow does not allow for mathjax for showing math?

How to show outliers in a ggplot histogram?

Answers (1)

Related Questions