jjcii
jjcii

Reputation: 159

ggalluvial package in R - Setting alluvial fill color axis by axis (rather than solely by final axis)

I am currently trying to produce an alluvial plot in R with the ggalluvial package. With it I wish to plot the successive migration across multiple between different values across successive segments of time (Seg1, Seg2, Seg3, Seg4). At Seg 1 all cases have the value of "workseg"; at Seg 2 the value may be one of three other values (related content, unrelated content, NONE); Seg3 and Seg4 values can be any of the four options.

Using the following code...

##Reorder levels per segment (make vertical order of strata levels identical 
across all axes, rather than "zig-zag" --> this is just an aesthetic 
preference)##

dRG.lode <- dRG %>%
  mutate(Seg2 = factor(Seg2, levels=c("workseg", "related content", 
"unrelated content", "NONE")),
         Seg3 = factor(Seg3, levels=c("workseg", "related content", 
"unrelated content", "NONE")),
         Seg4 = factor(Seg4, levels=c("workseg", "related content", 
"unrelated content", "NONE")))


##Plot##

ggplot(as.data.frame(dRG.lode),
       aes(axis1 = Seg1, axis2 = Seg2, axis3 = Seg3, axis4 = Seg4)) +
  geom_alluvium(aes(fill = Seg4), width = 1/12) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/12, fill = "black", color = "grey") +
  geom_label(stat = "stratum", label.strata = TRUE) +
  scale_x_discrete(limits = c("Seg1", "Seg2", "Seg3", "Seg4"), expand = 
c(.05, .05, .05, .05)) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("Time Course, Segment by Segment")

...I have been able to produce the following plot:

...enter image description here

My main question:

1) Is there a means by which to have the alluvial fill color not be consistent across an entire alluvial strand from start to finish, based on the Seg4 value, but INSTEAD to have the color change axis by axis, based upon the current axis value? For instance, I'd like all strands with a strata value of "workseg" at a given axis to be blue between that axis and the previous axis. Something akin to this seems possible based upon the vaccinations example at the bottom of this vignette (see last plot above appendix). The fills in that example reflect the strata they each came from, axis to axis (e.g., all strands coming from a "Never" strata are teal, regardless of their value at the next axis). I basically want to implement the inverse of this - that is, fill based on the strata of the next axis (e.g., all strands leading to a "workseg" strata are blue, regardless of where their value at the previous axis).

An unrelated, secondary question:

2) Is there any means by which to add annotation to alluvia? That is, they axes contain strata labels based upon values in the data set, but is there a means by which to add labels or other annotative information to the strands themselves (beyond manually in post-production)?

Upvotes: 0

Views: 5614

Answers (1)

Cory Brunson
Cory Brunson

Reputation: 718

In the vaccinations example, flows adopt aesthetics from the strata to their lefts via geom_flow() with fill = response (the stratum variable); this can't be done using geom_alluvium(), which renders each complete alluvium as a single graphical object ("grob"). The data you've linked to are in what ggalluvial considers "wide" format, i.e. each axis is a variable, but in order to have a consistent stratum variable the data need to be in "long" format.

The code below makes both of these changes and uses aes.flow = "backward" (see the documentation) to have flows adopt aesthetics from the strata to their rights (instead of their lefts).

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(ggalluvial)

dRG <- read.csv("~/Downloads/mydata.csv")

dRG.lode <- dRG %>%
  mutate(Seg2 = factor(Seg2, levels=c("workseg", "related content", 
                                      "unrelated content", "NONE")),
         Seg3 = factor(Seg3, levels=c("workseg", "related content", 
                                      "unrelated content", "NONE")),
         Seg4 = factor(Seg4, levels=c("workseg", "related content", 
                                      "unrelated content", "NONE")))

dRG.long <- to_lodes_form(dRG.lode, -X,
                          key = "segment", value = "value", id = "id")

ggplot(dRG.long,
       aes(x = segment, stratum = value, alluvium = id)) +
  geom_flow(aes(fill = value), width = 1/12, aes.flow = "backward") +
  guides(fill = FALSE) +
  geom_stratum(width = 1/12, fill = "black", color = "grey") +
  geom_label(stat = "stratum", label.strata = TRUE) +
  scale_x_discrete(limits = c("Seg1", "Seg2", "Seg3", "Seg4"), expand = 
                     c(.05, .05, .05, .05)) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("Time Course, Segment by Segment")

Created on 2019-03-28 by the reprex package (v0.2.1)

On reflection, the naming conventions for the parameter aes.flow and its options "forward" and "backward" might not be the most intuitive. I'd welcome suggestions on that!

Upvotes: 5

Related Questions