Reputation: 21
I am trying to make a histogram in ggplot2
, and I'm trying to make the y axis both logarithmic and showing percentages, to get it as 0.1%, 1%, 10% etc.
My dataset is 60.000 samples but I hope this kind of captures it:
-0.0651
-0.0649
-0.0661
-0.0652
-0.058
-0.045
-0.022
-0.001
+0.028
+0.039
-0.022
-0.0651
-0.0652
I can do both these things (1 making the y axis log and 1 making it percentage) independently. So when I just do percentage, I use the following code:
ggplot(aphist, aes(aphist$baseline1CW_Vm_samp)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth=0.0008) +
And I get this output, which has the percentages on it:
But I now want to make the y axis logarithmic. When I do that the way I've been taught, using the following code:
ggplot(aphist, aes(aphist$baseline1CW_Vm_samp)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth=0.0008) +
scale_y_continuous(trans = 'log10')
I suddenly get a very strange, flipped upside down plot:
..
I suspect it is because there are some samples which are 0 or close to 0 but I'm unsure. Any help would be much appreciated!
Upvotes: 2
Views: 1072
Reputation: 15907
Bar plots in ggplot
are created such that bars for positive values point upwards starting at y = 0, while bars for negative values point downwards from the same axis. You are showing density on the y-axis which lies between 0 and 1 by definition. The logarithm of a number in that range is negative and therefore all your bars point downwards.
I don't know of a way to let ggplot
do what you want automatically. However, you can achieve your goal by plotting counts instead of density. This will work, because counts are 1 or larger, which means that the logarithm is positive. The exception is, of course, when counts are 0. The logarithm of 0 diverges and those values won't be plotted, which is equivalent to plotting bars with zero height.
Since I don't have your data, I will show a solution using the built-in dataset faithful
. It should be easy enough to adapt that to your data.
As a demonstration of what I mean, I first show you an example, where the y-axis is not logarithmic. This has the advantage that the plot can be easily created without any tricks:
bw <- 2
n <- nrow(faithful)
ggplot(faithful, aes(waiting)) +
geom_histogram(aes(y = stat(density)), binwidth = bw)
Note that I have used stat(density)
instead of (..count..)/sum(..count..)
, which is a more modern way of achieving the same. I have also stored the binwdith and the number of data points into variables, since I will use those values often. The following code gives exactly the same image:
ggplot(faithful, aes(waiting)) +
geom_histogram(binwidth = bw) +
scale_y_continuous(
breaks = seq(0, 0.05, 0.01) * (bw * n),
labels = function(x) x / (bw * nrow(faithful))
)
Note that this time I plot counts, not density. However, I use the arguments breaks
and labels
in scale_y_continuous()
to redefine the positions of the breaks and their labels such that they show density nevertheless.
The same principle can be applied to the log-plot. First, I create the log-plot the same way you did, such that you can see that I end up with the same problem: the bars point downwards.
ggplot(faithful, aes(waiting)) +
geom_histogram(aes(y = stat(density)), binwidth = 2) +
scale_y_log10()
But by plotting counts and redefining the labels, you can get a more appropriate image:
ggplot(faithful, aes(waiting)) +
geom_histogram(binwidth = bw) +
scale_y_log10(
breaks = seq(0, 0.05, 0.01) * (bw * n),
labels = function(x) x / (bw * nrow(faithful))
)
Upvotes: 2