J. Paul
J. Paul

Reputation: 391

Count and axis labels on stat_bin2d with ggplot

I am trying to make a 2D histogram with the individual bins showing both the bin contents and a gradient. The data are integers ranging from 0 to 4 (only) in both axes.

I tried working with this answer but I end up with a few issues. First, a few bins end up getting no gradient at all. In the MWE below, the bottom left bins of 130 and 60 seems to be blank. Second, the bins are shifted to below 0 in both axes. For this axis issue, I found I could simply add a 0.5 to both x and y. In the end though, I also would like to have the axis labels to be centered within a bin and adding that 0.5 does not address that.

library(ggplot2)

# Construct the data to be plotted
x <- c(rep(0,190),rep(1,50),rep(2,10),rep(3,40))
y <- c(rep(0,130),rep(1,80),rep(2,30),rep(3,10),rep(4,40))
data <- data.frame(x,y)

# Taken from the example
ggplot(data, aes(x = x, y = y)) +
  geom_bin2d(binwidth=1) + 
  stat_bin2d(geom = "text", aes(label = ..count..), binwidth=1) + 
  scale_fill_gradient(low = "snow3", high = "red", trans = "log10") + 
  xlim(-1, 5) +
  ylim(-1, 5) +
  coord_equal()

enter image description here

Is there something obvious I am doing wrong in both the color gradients and axis labels? I am also not married to ggplot or stat_bin2d if there is a better way to do it with some other package/command. Thanks in advance!

Upvotes: 3

Views: 1881

Answers (1)

eipi10
eipi10

Reputation: 93761

stat_bin2d uses the cut function to create the bins. By default, cut creates bins that are open on the left and closed on the right. stat_bin2d also sets include.lowest=TRUE so that the lowest interval will be closed on the left also. I haven't looked through the code for stat_bin2d to try and figure out exactly what's going wrong, but it seems like it has to do with how the breaks in cut are being chosen. In any case, you can get the desired behavior by setting the bin breaks explicitly to start at -1. For example:

ggplot(data, aes(x = x, y = y)) +
  geom_bin2d(breaks=c(-1:4)) + 
  stat_bin2d(geom = "text", aes(label = ..count..), breaks=c(-1:4)) + 
  scale_fill_gradient(low = "snow3", high = "red", trans = "log10") + 
  xlim(-1, 5) +
  ylim(-1, 5) +
  coord_equal() 

enter image description here

To center the tiles on the integer lattice points, set the breaks to half-integer values:

ggplot(data, aes(x = x, y = y)) +
  geom_bin2d(breaks=seq(-0.5,4.5,1)) + 
  stat_bin2d(geom = "text", aes(label = ..count..), breaks=seq(-0.5,4.5,1)) + 
  scale_fill_gradient(low = "snow3", high = "red", trans = "log10") + 
  scale_x_continuous(breaks=0:4, limits=c(-0.5,4.5)) +
  scale_y_continuous(breaks=0:4, limits=c(-0.5,4.5)) +
  coord_equal()

enter image description here

Or, to emphasize that the values are discrete, set the bins to be half a unit wide:

ggplot(data, aes(x = x, y = y)) +
  geom_bin2d(breaks=seq(-0.25,4.25,0.5)) + 
  stat_bin2d(geom = "text", aes(label = ..count..), breaks=seq(-0.25,4.25,0.5)) + 
  scale_fill_gradient(low = "snow3", high = "red", trans = "log10") + 
  scale_x_continuous(breaks=0:4, limits=c(-0.25,4.25)) +
  scale_y_continuous(breaks=0:4, limits=c(-0.25,4.25)) +
  coord_equal()

enter image description here

Upvotes: 3

Related Questions