Reputation: 3486
I am trying to use ggalluvial to track academic paths of students over semesters and see how students change curriculum over time.
This is a sample of my dataset:
structure(list(id = c("1", "2", "6", "8", "9", "10", "11", "12",
"14", "15", "1", "2", "6", "8", "9", "10", "11", "12", "14",
"15", "1", "2", "6", "8", "9", "10", "11", "12", "14", "15",
"1", "2", "6", "8", "9", "10", "11", "12", "14", "15", "1", "2",
"6", "8", "9", "10", "11", "12", "14", "15", "1", "2", "6", "8",
"9", "10", "11", "12", "14", "15", "1", "2", "6", "8", "9", "10",
"11", "12", "14", "15", "1", "2", "6", "8", "9", "10", "11",
"12", "14", "15"),
curr = c("CURR1", "CURR1", "CURR1", "CURR1",
"CURR1", "CURR1", "CURR1", "CURR1", "CURR1", "CURR1", "CURR3",
"CURR3", "CURR3", "CURR3", "CURR3", "CURR3", "CURR3", "CURR3",
"CURR3", "CURR3", "CURR5", "CURR5", "CURR5", "CURR5", "CURR5",
"CURR5", "CURR5", "CURR5", "CURR5", "CURR5", "CURR7", "CURR7",
"CURR7", "CURR7", "CURR7", "CURR7", "CURR7", "CURR7", "CURR7",
"CURR7", "CURR9", "CURR9", "CURR9", "CURR9", "CURR9", "CURR9",
"CURR9", "CURR9", "CURR9", "CURR9", "CURR11", "CURR11", "CURR11",
"CURR11", "CURR11", "CURR11", "CURR11", "CURR11", "CURR11", "CURR11",
"CURR13", "CURR13", "CURR13", "CURR13", "CURR13", "CURR13", "CURR13",
"CURR13", "CURR13", "CURR13", "CURR15", "CURR15", "CURR15", "CURR15",
"CURR15", "CURR15", "CURR15", "CURR15", "CURR15", "CURR15"),
value = c("ISDS", "ISDS", "GBUS", "ISDS", "GBUS", "ISDS",
"ACCT", "GBUS", "ITF", "MGT", "ISDS", "ISDS", "GBUS", "ISDS",
"MKT", "ISDS", "ACCT", "GBUS", "ITF", "MGT", "ISDS", "ISDS",
"ISDS", "ISDS", "MKT", "ISDS", "ACCT", "GBUS", "ISDS", "MGT",
"ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ACCT", "GBUS",
"ISDS", "ISDS", "ISDS", "ISDS", "ISDS", NA, "ISDS", "ISDS",
"ACCT", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS",
"ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", "ISDS", NA,
"ISDS", "ISDS", "ISDS", NA, "ISDS", "ISDS", "ISDS", "ISDS",
"ISDS", NA, "ISDS", "ISDS", "ISDS", NA, "ISDS", "ISDS", "ISDS",
NA)), class = "data.frame", row.names = c(NA, -80L), .Names = c("id",
"curr", "value"))
I would like to map:
CURR
(a time variable), to the x-axis
value
to different heights of the y-axis
the count of value
for each CURR
to the width of the flows
The diagram should present from which/into which curriculum they "flow" over time.
This is what I have so far, which is pretty off
ggplot(as.data.frame(ff2),
aes(x=curr, axis1=value, group=id)) +
geom_alluvium(aes(fill = value))
The x-axis looks alright, but the weight
does not reflect the different weights of curricula over time nor I can follow the students' "flows".
Upvotes: 2
Views: 740
Reputation: 718
Sorry for the delay. I just merged an experimental branch including a separate geom for plotting the "flows" between axes instead of the full "alluvia" that span the entire diagram, plus a bunch of new parameters. This makes the kind of plot you're describing possible using the following code, assuming that ff2
is assigned the structure()
call in the OP.
# keep the values of 'curr' in their proper order
ff2$curr <- factor(ff2$curr, levels = unique(ff2$curr))
ggplot(ff2, aes(
# position aesthetics:
# 'x' as in 'geom_bar()'
# 'stratum' and 'alluvium' specific to ggalluvial
x = curr, stratum = value, alluvium = id,
# apply 'fill' colors to both flows and strata
fill = value
)) +
# flow parameters:
# 'lode.guidance' says how to arrange splines in each stratum
# 'aes.flow' says which axis determines flow aesthetics
geom_flow(lode.guidance = "rightleft", aes.flow = "forward") +
geom_stratum() +
# include text labels at each stratum
geom_text(stat = "stratum")
Thanks for pointing out this need, especially for handling NA
s in a consistent way!
Upvotes: 2
Reputation: 624
Try the following. It's not gorgeous, but it works. You can use base graphics to clean it up a bit.
Install the following packages if you haven't already, then load them:
library(alluvial)
library(tidyr)
Edit your data:
ff2$value[is.na(ff2$value)] <- "None" # Replace NAs with a category so they're not lost
ff2$curr <- as.numeric(substr(ff2$curr, 5, nchar(ff2$curr))) # Change your term labels to numeric for easy & correct ordering
ff3 <- spread(ff2, curr, value, fill = "None") #spread your df from long to wide format
Color your chart by student, for easier tracking over time:
cl <- colors(distinct = TRUE)
color_palette <- sample(cl, length(ff3$id))
Plot:
alluvial(ff3[,2:9],
freq = 8,
col = color_palette,
blocks = T,
xw = 0.2,# makes the ribbons a bit wavier
axis_labels = c("Term1","Term2", "Term3","Term4","Term5","Term6", "Term7","Term8"))
Upvotes: 0