MaVe
MaVe

Reputation: 219

How to add percentage values to strata in alluvial plot with ggalluvial?

I'm looking for the most convenient way for adding rounded percentage labels to strata of an alluvial plot. There are 50 cases in the following example. Independently of stages 1 or 2, each case belongs to one group of A, B or C. I'd like to display the relative group affiliation during each stage.

library(ggplot2)
library(ggalluvial)

df <- data.frame('id' = rep(1:50,2),
                     'stage' = c(rep(1,50), rep(2,50)),
                     'group' = sample(c('A','B','C'), 100, replace = TRUE))

ggplot(df,
       aes(x = stage, stratum = group, alluvium = id, fill = group)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5)

enter image description here

Is there a way to add rounded percentage labels (including "%") to the strata (bar segments) without calculating a percentage column in the initial data frame? If I'm not completely mistaken, geom_text doesn't work the same way here as in geom_bar().

Upvotes: 2

Views: 4676

Answers (2)

Cory Brunson
Cory Brunson

Reputation: 718

The standard ggplot2 solution to this question is to use "calculated aesthetics". These are aesthetic specifications that come not from the data passed to ggplot() but from the output of the statistical transformation (the stat_*()), which is used to render the graphical elements (the geom_*()). The columns of this output (which are rarely seen by the user) are called "computed variables". The documentation on this topic is limited and a bit out of date, using stat() instead of after_stat() to call them. Since ggalluvial did not support computed variables, the answer from @bencekd was correct at the time.

As of today, v0.12.0 is on CRAN with support and documentation for computed variables. In particular, three computed variables are available that correspond to variables with the same names used by stat_bin() or stat_count(): n, count (a weighted version of n), and prop (a within-axis proportion calculated from count). It looks like you'd want to use prop, as illustrated below:

library(ggplot2)
library(scales)
library(ggalluvial)

df <- data.frame('id' = rep(1:50,2),
                 'stage' = c(rep(1,50), rep(2,50)),
                 'group' = sample(c('A','B','C'), 100, replace = TRUE))

ggplot(df,
       aes(x = stage, stratum = group, alluvium = id, fill = group)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum",
            aes(label = percent(after_stat(prop), accuracy = .1)))

Created on 2020-07-14 by the reprex package (v0.3.0)

Upvotes: 4

bencekd
bencekd

Reputation: 1595

Unfortunately I don't think you can do it without calculating the percentage column in the initial data frame yet. But that can be done easily and also gives more flexibility with the labeling:

library(ggplot2)
library(ggalluvial)

df <- data.frame('id' = rep(1:50,2),
                     'stage' = c(rep(1,50), rep(2,50)),
                     'group' = sample(c('A','B','C'), 100, replace = TRUE))

# the list needs to be reversed, as stratums are displayed reversed in the alluvial by default

stratum_list <- df %>% 
  group_by(stage, group) %>% 
  summarize(s = n()) %>%
  group_by(stage) %>%
  mutate(s = percent(s/sum(s), accuracy=0.1)) %>%
  arrange(stage, -as.numeric(group)) %>% 
  .$s

ggplot(df,
       aes(x = stage, stratum = group, alluvium = id, fill = group)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) + 
  geom_text(stat = "stratum", label=stratum_list)

enter image description here

UPDATE [13/04/2020]

Added stratum_list reversion as Yonghao suggested

Upvotes: 1

Related Questions