Reputation: 1365
Through advice from this site, I have built a hexbin plot in ggplot which shows the count of data points in each bin, and highlights particular bins of interest.
I now want to extend this plot one step further to show the proportion of a second grouping category within each hexbin. This can already be acheived with the hextri package, but I can't combine the ggplot solutions from my previous question with output from the hextri package.
The ultimate goal is to have a plot that looks like the output from the hextri package, and highlights the cells of interest.
Below is some example data code that can create the ggplot with highlighted cells, and the hextri plot with the categorical proportions shown. These two features are what I want to combine.
I have tried playing with the border input of the hextri function to achieve the desired outcome but with no success yet.
library(hextri)
library(ggplot2)
n = 100
df = data.frame(x = rnorm(n),
y = rnorm(n),
group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))
# hextri plot
hextri_plot = hextri(
df$x,
df$y,
class = df$group,
colour = c("#00b38a", "#ea324c"),
nbins = 3,
diffuse = FALSE,
sorted = TRUE
)
# GGplot
ggplot(df, aes(x = x, y = y)) +
geom_hex() +
stat_summary_hex(aes(
z = group,
color = after_stat(as.character(value))
), fun = ~ +any(.x == 1), fill = NA) +
scale_color_manual(
values = c("0" = "transparent", "1" = "yellow"),
guide = "none"
)
Upvotes: 8
Views: 575
Reputation: 173888
This is not a trivial problem. It requires writing a new Geom
, a new Stat
and a new Grob
(see below). I'm personally not convinced that it is a great data visualization option, as it is position-distorting and involves significant rounding errors. However, it is visually appealing and fairly intuitive, so I went ahead and wrote a geom_hextri
anyway. To get it to work, we simply map its aesthetics to a categorical variable and it should behave much as expected.
Let's use your own example data:
set.seed(1)
n = 100
df = data.frame(x = rnorm(n),
y = rnorm(n),
group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))
And plot it with geom_hextri
using your chosen color scheme. We will overlay points so we can ensure the logic of the segment fills matches the points.
ggplot(df, aes(x, y, fill = factor(group), color = factor(group))) +
geom_hextri(linewidth = 0.3, bins = 4) +
geom_point(shape = 21, size = 3, color = "black") +
coord_equal() +
theme_classic(base_size = 16) +
theme(aspect.ratio = 1) +
scale_fill_manual("Group", values = c("#00b38a", "#ea324c")) +
scale_color_manual("Group", values = c("#00b38a", "#ea324c"))
Note that it's easy to change the bin size and aesthetics if we want. To get solid hexagons around our triangles, we just add a geom_hex
layer:
ggplot(df, aes(x, y, fill = factor(group))) +
geom_hextri(color = "black", linewidth = 0.1, bins = 5) +
geom_point(shape = 21, size = 3) +
geom_hex(fill = NA, color = "black", linewidth = 1, bins = 5) +
coord_equal() +
theme_classic(base_size = 16) +
theme(aspect.ratio = 1) +
scale_fill_manual("Group", values = c("gray", "red"))
And applying to another data set we get:
ggplot(iris, aes(Sepal.Width, Sepal.Length, fill = Species)) +
geom_hextri(color = "white", linewidth = 0.1, bins = 5) +
geom_point(shape = 21, size = 3, position = position_jitter(0.03, 0.03),
color = "white") +
geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
coord_equal() +
theme_minimal(base_size = 20) +
theme(aspect.ratio = 1) +
scale_fill_brewer(palette = "Set2")
Also note we don't need to use the fill aesthetic. We can, for example, simply change the outline colour:
ggplot(iris, aes(Sepal.Width, Sepal.Length, colour = Species)) +
geom_hextri(fill = NA, linewidth = 2, bins = 5, alpha = 1) +
geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
coord_equal() +
theme_minimal(base_size = 20) +
theme(aspect.ratio = 1) +
scale_colour_brewer(palette = "Set1")
Code for geom_hextri
Now the difficult part - the implementation of geom_hextri
. I have tried to break this down into chunks, but the code is necessarily long and too difficult to explain in any great detail. I have also had to sacrifice spacing a bit to allow it to fit into code boxes that don't need scrolling.
Ultimately, ggplot has to draw objects on the plotting device as graphical objects (grobs
), but there is no existing off-the-shelf grob
that will draw these hexagonal segments, so we need to define a function that will draw them using grid::polygonGrob
, given appropriate x, y co-ordinates, heights, widths, graphical parameters, and the segment we are dealing with. This needs to accept vectorized data to work with ggplot:
hextriGrob <- function(x, y, seg, height, width, gp = grid::gpar()) {
gp <- lapply(seq_along(x), function(i) structure(gp[i], class = "gpar"))
xl <- x - width
xr <- x + width
y1 <- y + 2 * height
y2 <- y + height
y3 <- y - height
y4 <- y - 2 * height
pg <- grid::polygonGrob
do.call(grid::gList,
Map(function(x, y, xl, xr, y1, y2, y3, y4, seg, gp) {
if(seg == 1) return(pg(x = c(x, x, xr, x), y = c(y, y1, y2, y), gp = gp))
if(seg == 2) return(pg(x = c(x, xr, xr, x), y = c(y, y2, y3, y), gp = gp))
if(seg == 3) return(pg(x = c(x, xr, x, x), y = c(y, y3, y4, y), gp = gp))
if(seg == 4) return(pg(x = c(x, x, xl, x), y = c(y, y4, y3, y), gp = gp))
if(seg == 5) return(pg(x = c(x, xl, xl, x), y = c(y, y3, y2, y), gp = gp))
if(seg == 6) return(pg(x = c(x, xl, x, x), y = c(y, y2, y1, y), gp = gp))
}, x = x, y = y, xl = xl, xr = xr, y1 = y1,
y2 = y2, y3 = y3, y4 = y4, seg = seg, gp = gp))
}
But this isn't itself enough. We also need to define a geom
that inherits from GeomHex
, but has its own compute_group
method to invoke our hextriGrob
function appropriately. Part of its job will be to ensure that aesthetics are split correctly into segments, which for technical reasons cannot all easily be done within a Stat
layer.
GeomHextri <- ggproto("GeomHextri", GeomHex,
draw_group = function (self, data, panel_params, coord, lineend = "butt",
linejoin = "mitre", linemitre = 10) {
table_six <- function(vec) {
if(!is.factor(vec)) vec <- factor(vec)
tab <- round(6 * table(vec, useNA = "always")/length(vec))
n <- diff(c(0, findInterval(cumsum(tab) / sum(tab), 1:6/6)))
rep(names(tab), times = n)
}
num_cols <- sapply(data, is.numeric)
non_num_cols <- names(data)[!num_cols]
num_cols <- names(data)[num_cols]
datasplit <- split(data, interaction(data$x, data$y, drop = TRUE))
data <- do.call("rbind", lapply(seq_along(datasplit), function(i) {
num_list <- lapply(datasplit[[i]][num_cols], function(x) rep(mean(x), 6))
non_num_list <- lapply(datasplit[[i]][non_num_cols], function(x) {
table_six(rep(x, times = datasplit[[i]]$count))})
d <- datasplit[[i]][rep(1, 6),]
d[num_cols] <- num_list
d[non_num_cols] <- non_num_list
d$tri <- 1:6
d$group <- i
d}))
data <- ggplot2:::check_linewidth(data, snake_class(self))
if (ggplot2:::empty(data)) return(zeroGrob())
coords <- coord$transform(data, panel_params)
hw <- c(min(diff(unique(sort(coords$x)))),
min(diff(unique(sort(coords$y))))/3)
hextriGrob(coords$x, coords$y, data$tri, hw[2], hw[1],
gp = grid::gpar(col = data$colour, fill = alpha(data$fill, data$alpha),
lwd = data$linewidth * .pt, lty = data$linetype,
lineend = lineend, linejoin = linejoin,
linemitre = linemitre))})
Before our data even gets to this geom, it needs to be binned into hexagons. Unfortunately, the existing StatBinhex
will not be able to do this while preserving the individual segment-level aesthetic detail we need, so we have to write our own binning function:
hexify <- function (x, y, z, xbnds, ybnds, xbins, ybins, binwidth,
fun = mean, fun.args = list(),
drop = TRUE) {
hb <- hexbin::hexbin(x, xbnds = xbnds, xbins = xbins, y,
ybnds = ybnds, shape = ybins/xbins, IDs = TRUE)
value <- rlang::inject(tapply(z, hb@cID, fun, !!!fun.args))
out <- hexbin::hcell2xy(hb)
out <- ggplot2:::data_frame0(!!!out)
out$value <- as.vector(value)
out$width <- binwidth[1]
out$height <- binwidth[2]
if (drop) out <- stats::na.omit(out)
out
}
This then has to be used inside a custom Stat
:
StatHextri <- ggproto("StatBinhex", StatBinhex,
default_aes = aes(weight = 1, alpha = after_stat(count)),
compute_panel = function (self, data, scales, ...) {
if (ggplot2:::empty(data)) return(ggplot2:::data_frame0())
data$group <- 1
self$compute_group(data = data, scales = scales, ...)},
compute_group = function (data, scales, binwidth = NULL, bins = 30,
na.rm = FALSE){
`%||%` <- rlang::`%||%`
rlang::check_installed("hexbin", reason = "for `stat_binhex()`")
binwidth <- binwidth %||% ggplot2:::hex_binwidth(bins, scales)
if (length(binwidth) == 1) binwidth <- rep(binwidth, 2)
wt <- data$weight %||% rep(1L, nrow(data))
non_pos <- !names(data) %in% c("x", "y", "PANEL", "group")
is_num <- sapply(data, is.numeric)
aes_vars <- names(data)[non_pos & !is_num]
grps <- do.call("interaction", c(as.list(data[aes_vars]), drop = TRUE))
xbnds <- ggplot2:::hex_bounds(data$x, binwidth[1])
xbins <- diff(xbnds)/binwidth[1]
ybnds <- ggplot2:::hex_bounds(data$y, binwidth[2])
ybins <- diff(ybnds)/binwidth[2]
do.call("rbind", Map(function(data, wt) {
out <- hexify(data$x, data$y, wt, xbnds, ybnds, xbins,
ybins, binwidth, sum)
for(var in aes_vars) out[[var]] <- data[[var]][1]
out$density <- as.vector(out$value/sum(out$value, na.rm = TRUE))
out$ndensity <- out$density/max(out$density, na.rm = TRUE)
out$count <- out$value
out$ncount <- out$count/max(out$count, na.rm = TRUE)
out$value <- NULL
out$group <- 1
out}, split(data, grps), split(wt, grps)))})
Finally, we need to write a geom function so that we can easily invoke all of the above in a ggplot call:
geom_hextri <- function(
mapping = aes(),
data = NULL,
stat = "hextri",
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
bins = 10,
...) {
ggplot2::layer(
geom = GeomHextri,
data = data,
mapping = mapping,
stat = stat,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, bins = bins, ...)
)
}
Upvotes: 17