bananabreaded
bananabreaded

Reputation: 21

How do I make the y axis of a histogram both logarithmic and percentage?

I am trying to make a histogram in ggplot2, and I'm trying to make the y axis both logarithmic and showing percentages, to get it as 0.1%, 1%, 10% etc.

My dataset is 60.000 samples but I hope this kind of captures it:

    -0.0651
    -0.0649
    -0.0661
    -0.0652
    -0.058
    -0.045
    -0.022
    -0.001
    +0.028
    +0.039
    -0.022
    -0.0651
    -0.0652

I can do both these things (1 making the y axis log and 1 making it percentage) independently. So when I just do percentage, I use the following code:

ggplot(aphist, aes(aphist$baseline1CW_Vm_samp)) +
  geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth=0.0008) +

And I get this output, which has the percentages on it:

percentage

But I now want to make the y axis logarithmic. When I do that the way I've been taught, using the following code:

ggplot(aphist, aes(aphist$baseline1CW_Vm_samp)) +
  geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth=0.0008) +
  scale_y_continuous(trans = 'log10')

I suddenly get a very strange, flipped upside down plot:

..

I suspect it is because there are some samples which are 0 or close to 0 but I'm unsure. Any help would be much appreciated!

Upvotes: 2

Views: 1072

Answers (1)

Stibu
Stibu

Reputation: 15907

Why the bars point downwards and what to do about it

Bar plots in ggplot are created such that bars for positive values point upwards starting at y = 0, while bars for negative values point downwards from the same axis. You are showing density on the y-axis which lies between 0 and 1 by definition. The logarithm of a number in that range is negative and therefore all your bars point downwards.

I don't know of a way to let ggplot do what you want automatically. However, you can achieve your goal by plotting counts instead of density. This will work, because counts are 1 or larger, which means that the logarithm is positive. The exception is, of course, when counts are 0. The logarithm of 0 diverges and those values won't be plotted, which is equivalent to plotting bars with zero height.

A simple example

Since I don't have your data, I will show a solution using the built-in dataset faithful. It should be easy enough to adapt that to your data.

As a demonstration of what I mean, I first show you an example, where the y-axis is not logarithmic. This has the advantage that the plot can be easily created without any tricks:

bw <- 2
n <- nrow(faithful)
ggplot(faithful, aes(waiting)) +
  geom_histogram(aes(y = stat(density)), binwidth = bw)

enter image description here

Note that I have used stat(density) instead of (..count..)/sum(..count..), which is a more modern way of achieving the same. I have also stored the binwdith and the number of data points into variables, since I will use those values often. The following code gives exactly the same image:

ggplot(faithful, aes(waiting)) +
  geom_histogram(binwidth = bw) +
  scale_y_continuous(
    breaks = seq(0, 0.05, 0.01) * (bw * n),
    labels = function(x) x / (bw * nrow(faithful))
  )

Note that this time I plot counts, not density. However, I use the arguments breaks and labels in scale_y_continuous() to redefine the positions of the breaks and their labels such that they show density nevertheless.

Solution with logarithmic y-axis

The same principle can be applied to the log-plot. First, I create the log-plot the same way you did, such that you can see that I end up with the same problem: the bars point downwards.

ggplot(faithful, aes(waiting)) +
  geom_histogram(aes(y = stat(density)), binwidth = 2) +
  scale_y_log10()

enter image description here

But by plotting counts and redefining the labels, you can get a more appropriate image:

ggplot(faithful, aes(waiting)) +
  geom_histogram(binwidth = bw) +
  scale_y_log10(
    breaks = seq(0, 0.05, 0.01) * (bw * n),
    labels = function(x) x / (bw * nrow(faithful))
  )

enter image description here

Upvotes: 2

Related Questions