Dambo
Dambo

Reputation: 3486

How to map individual path to an alluvial diagram?

I am trying to use ggalluvial to track academic paths of students over semesters and see how students change curriculum over time.

This is a sample of my dataset:

structure(list(id = c("1", "2", "6", "8", "9", "10", "11", "12", 
"14", "15", "1", "2", "6", "8", "9", "10", "11", "12", "14", 
"15", "1", "2", "6", "8", "9", "10", "11", "12", "14", "15", 
"1", "2", "6", "8", "9", "10", "11", "12", "14", "15", "1", "2", 
"6", "8", "9", "10", "11", "12", "14", "15", "1", "2", "6", "8", 
"9", "10", "11", "12", "14", "15", "1", "2", "6", "8", "9", "10", 
"11", "12", "14", "15", "1", "2", "6", "8", "9", "10", "11", 
"12", "14", "15"), 
curr = c("CURR1", "CURR1", "CURR1", "CURR1", 
    "CURR1", "CURR1", "CURR1", "CURR1", "CURR1", "CURR1", "CURR3", 
    "CURR3", "CURR3", "CURR3", "CURR3", "CURR3", "CURR3", "CURR3", 
    "CURR3", "CURR3", "CURR5", "CURR5", "CURR5", "CURR5", "CURR5", 
    "CURR5", "CURR5", "CURR5", "CURR5", "CURR5", "CURR7", "CURR7", 
    "CURR7", "CURR7", "CURR7", "CURR7", "CURR7", "CURR7", "CURR7", 
    "CURR7", "CURR9", "CURR9", "CURR9", "CURR9", "CURR9", "CURR9", 
    "CURR9", "CURR9", "CURR9", "CURR9", "CURR11", "CURR11", "CURR11", 
    "CURR11", "CURR11", "CURR11", "CURR11", "CURR11", "CURR11", "CURR11", 
    "CURR13", "CURR13", "CURR13", "CURR13", "CURR13", "CURR13", "CURR13", 
    "CURR13", "CURR13", "CURR13", "CURR15", "CURR15", "CURR15", "CURR15", 
    "CURR15", "CURR15", "CURR15", "CURR15", "CURR15", "CURR15"), 
        value = c("ISDS", "ISDS", "GBUS", "ISDS", "GBUS", "ISDS", 
        "ACCT", "GBUS", "ITF", "MGT", "ISDS", "ISDS", "GBUS", "ISDS", 
        "MKT", "ISDS", "ACCT", "GBUS", "ITF", "MGT", "ISDS", "ISDS", 
        "ISDS", "ISDS", "MKT", "ISDS", "ACCT", "GBUS", "ISDS", "MGT", 
        "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ACCT", "GBUS", 
        "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", NA, "ISDS", "ISDS", 
        "ACCT", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", 
        "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", NA, 
        "ISDS", "ISDS", "ISDS", NA, "ISDS", "ISDS", "ISDS", "ISDS", 
        "ISDS", NA, "ISDS", "ISDS", "ISDS", NA, "ISDS", "ISDS", "ISDS", 
        NA)), class = "data.frame", row.names = c(NA, -80L), .Names = c("id", 
    "curr", "value"))

I would like to map:

  1. CURR (a time variable), to the x-axis

  2. value to different heights of the y-axis

  3. the count of value for each CURR to the width of the flows

The diagram should present from which/into which curriculum they "flow" over time.

This is what I have so far, which is pretty off

ggplot(as.data.frame(ff2),
      aes(x=curr, axis1=value, group=id)) +
     geom_alluvium(aes(fill = value))

enter image description here

The x-axis looks alright, but the weight does not reflect the different weights of curricula over time nor I can follow the students' "flows".

Upvotes: 2

Views: 740

Answers (2)

Cory Brunson
Cory Brunson

Reputation: 718

Sorry for the delay. I just merged an experimental branch including a separate geom for plotting the "flows" between axes instead of the full "alluvia" that span the entire diagram, plus a bunch of new parameters. This makes the kind of plot you're describing possible using the following code, assuming that ff2 is assigned the structure() call in the OP.

# keep the values of 'curr' in their proper order
ff2$curr <- factor(ff2$curr, levels = unique(ff2$curr))
ggplot(ff2, aes(
  # position aesthetics:
  # 'x' as in 'geom_bar()'
  # 'stratum' and 'alluvium' specific to ggalluvial
  x = curr, stratum = value, alluvium = id,
  # apply 'fill' colors to both flows and strata
  fill = value
)) +
  # flow parameters:
  # 'lode.guidance' says how to arrange splines in each stratum
  # 'aes.flow' says which axis determines flow aesthetics
  geom_flow(lode.guidance = "rightleft", aes.flow = "forward") +
  geom_stratum() +
  # include text labels at each stratum
  geom_text(stat = "stratum")

Thanks for pointing out this need, especially for handling NAs in a consistent way!

Upvotes: 2

jesstme
jesstme

Reputation: 624

Try the following. It's not gorgeous, but it works. You can use base graphics to clean it up a bit.

Install the following packages if you haven't already, then load them:

library(alluvial) 
library(tidyr) 

Edit your data:

ff2$value[is.na(ff2$value)] <- "None" # Replace NAs with a category so they're not lost
ff2$curr <- as.numeric(substr(ff2$curr, 5, nchar(ff2$curr))) # Change your term labels to numeric for easy & correct ordering
ff3 <- spread(ff2, curr, value, fill = "None") #spread your df from long to wide format

Color your chart by student, for easier tracking over time:

cl <- colors(distinct = TRUE)
color_palette <- sample(cl, length(ff3$id))

Plot:

alluvial(ff3[,2:9], 
         freq = 8,
         col = color_palette,
         blocks = T,
         xw = 0.2,# makes the ribbons a bit wavier
         axis_labels = c("Term1","Term2", "Term3","Term4","Term5","Term6", "Term7","Term8"))

Upvotes: 0

Related Questions