CptNemo
CptNemo

Reputation: 6755

How to improve the aspect of ggplot histograms with log scales and discrete values

I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.

Please consider the following MWE

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()

which produces

enter image description here

and then

ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))

which probably is even worse

enter image description here

since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).

I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.

Is it possible to improve something?

EDIT:

This what happen when I applied Jaap solution to my real data

enter image description here

Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?

Upvotes: 16

Views: 24209

Answers (4)

lljotall
lljotall

Reputation: 11

I had the same issue and, inspired by @Jaap's answer, I fiddled with the histogram binwidth using the x-axis in log scale.

If you use binwidth = 0.201, the bars will be juxtaposed as expected. However, this means you can only have up to five bars between two x coordinates.

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + 
   geom_histogram(binwidth = 0.201, color = 'red') + 
   scale_x_log10()

Result:

enter image description here

Upvotes: 0

ROLO
ROLO

Reputation: 4223

A solution could be to convert your data to a factor:

library(ggplot2)
set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
ggplot(data, aes(x=factor(dist))) + 
    geom_histogram(stat = "count") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

Resulting in: enter image description here

Upvotes: 4

Gaurav Singhal
Gaurav Singhal

Reputation: 1086

I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()

Basic Graph

Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.

ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
  stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))

enter image description here

Upvotes: 5

Jaap
Jaap

Reputation: 83215

The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:

ggplot(data, aes(x=dist)) +
  geom_histogram(binwidth=10) +
  scale_x_continuous(expand=c(0,0)) +
  scale_y_continuous(expand=c(0.015,0)) +
  theme_bw()

gives: enter image description here


In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.

The following code:

library(ggplot2)
library(scales)

ggplot(data, aes(x=dist)) +
  stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
  scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
  scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
  theme_bw()

will give this result: enter image description here

Upvotes: 13

Related Questions