Reputation: 219
I'm looking for the most convenient way for adding rounded percentage labels to strata of an alluvial plot. There are 50 cases in the following example. Independently of stages 1 or 2, each case belongs to one group of A, B or C. I'd like to display the relative group affiliation during each stage.
library(ggplot2)
library(ggalluvial)
df <- data.frame('id' = rep(1:50,2),
'stage' = c(rep(1,50), rep(2,50)),
'group' = sample(c('A','B','C'), 100, replace = TRUE))
ggplot(df,
aes(x = stage, stratum = group, alluvium = id, fill = group)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5)
Is there a way to add rounded percentage labels (including "%") to the strata (bar segments) without calculating a percentage column in the initial data frame? If I'm not completely mistaken, geom_text doesn't work the same way here as in geom_bar().
Upvotes: 2
Views: 4676
Reputation: 718
The standard ggplot2 solution to this question is to use "calculated aesthetics". These are aesthetic specifications that come not from the data passed to ggplot()
but from the output of the statistical transformation (the stat_*()
), which is used to render the graphical elements (the geom_*()
). The columns of this output (which are rarely seen by the user) are called "computed variables". The documentation on this topic is limited and a bit out of date, using stat()
instead of after_stat()
to call them. Since ggalluvial did not support computed variables, the answer from @bencekd was correct at the time.
As of today, v0.12.0 is on CRAN with support and documentation for computed variables. In particular, three computed variables are available that correspond to variables with the same names used by stat_bin()
or stat_count()
: n
, count
(a weighted version of n
), and prop
(a within-axis proportion calculated from count
). It looks like you'd want to use prop
, as illustrated below:
library(ggplot2)
library(scales)
library(ggalluvial)
df <- data.frame('id' = rep(1:50,2),
'stage' = c(rep(1,50), rep(2,50)),
'group' = sample(c('A','B','C'), 100, replace = TRUE))
ggplot(df,
aes(x = stage, stratum = group, alluvium = id, fill = group)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum",
aes(label = percent(after_stat(prop), accuracy = .1)))
Created on 2020-07-14 by the reprex package (v0.3.0)
Upvotes: 4
Reputation: 1595
Unfortunately I don't think you can do it without calculating the percentage column in the initial data frame yet. But that can be done easily and also gives more flexibility with the labeling:
library(ggplot2)
library(ggalluvial)
df <- data.frame('id' = rep(1:50,2),
'stage' = c(rep(1,50), rep(2,50)),
'group' = sample(c('A','B','C'), 100, replace = TRUE))
# the list needs to be reversed, as stratums are displayed reversed in the alluvial by default
stratum_list <- df %>%
group_by(stage, group) %>%
summarize(s = n()) %>%
group_by(stage) %>%
mutate(s = percent(s/sum(s), accuracy=0.1)) %>%
arrange(stage, -as.numeric(group)) %>%
.$s
ggplot(df,
aes(x = stage, stratum = group, alluvium = id, fill = group)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", label=stratum_list)
UPDATE [13/04/2020]
Added stratum_list
reversion as Yonghao suggested
Upvotes: 1