Reputation: 1653
I am trying to use an Alluvial plot (Sankey diagram) to show the change of different categories over two time periods. I was able to create a plot that makes sense to me when all the factor levels are represented in both time periods (pre and post), but with my data the plot looks strange after changing the order of the factors. I also would like to show the same fill colors for the categories from both time periods, but could only change the first time period (pre). When I do the plot, I notice that the colors that I specify are not the ones that I want for each of the factor levels, despite the order of the boxes / stratum is correct.
Any help or advice on how to improve the plot and how I can overcome the problem of ordering the factor levels from both groups when the categories ar not fully represented in both time periods would be so helpful.
Here is the code:
db <- read.table(text = "pre post freq
NE NE 0
NE DD 2
NE LC 5
NE NT 2
NE VU 3
NE EN 5
NE CR 1
DD NE 0
DD DD 3
DD LC 37
DD NT 10
DD VU 14
DD EN 3
DD CR 3
LC NE 0
LC DD 0
LC LC 18
LC NT 2
LC VU 1
LC EN 2
LC CR 0
NT NE 0
NT DD 1
NT LC 3
NT NT 8
NT VU 13
NT EN 5
NT CR 1
VU NE 0
VU DD 0
VU LC 1
VU NT 0
VU VU 7
VU EN 8
VU CR 3
EN NE 0
EN DD 0
EN LC 0
EN NT 0
EN VU 0
EN EN 0
EN CR 2
CR NE 0
CR DD 0
CR LC 1
CR NT 0
CR VU 0
CR EN 0
CR CR 2
", header=T)
head(db)
# Order factor levels
levels(db$pre) <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR")
levels(db$post) <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR")
# Set colors for the plot
colors.p <- c("#282828", "#7C7C7C", "#20AB5F", "#3EFF00",
"#FBFF00", "#FFBD00", "#FF0C00")
# Plot
p <- ggplot(db,
aes(y = freq, axis1 = pre,
axis2 = post)) +
geom_alluvium(aes(fill = pre), show.legend = FALSE) +
geom_stratum(aes(fill = pre), color = "black", alpha = 0.5) +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("previous", "current"),
expand = c(0.3, 0.01)) +
scale_fill_manual(values = colors.p) +
theme_void() +
theme(
panel.background = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(size = 15, face = "bold"),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.position = "none"
)
p
Upvotes: 1
Views: 1967
Reputation: 1318
I gave it a shot with a different package I am more familiar with (ggsankey). I also removed one category from each of the timepoints to illustrate the factor reordering and that this is possible. Does this solve your issues? If not, please clarify what you are still missing.
library(tidyverse)
library(ggsankey)
db <- data.frame(pre = rep(c("DD", "LC", "NT",
"VU", "EN", "CR"), each = 6),
post = rep(c("DD", "LC", "NT",
"VU", "EN", "CR"), times = 6),
freq = rep(sample(seq(0:20), 6), 6))
db %>%
uncount(freq) %>%
filter(pre != "DD", post != "NT") %>%
make_long(pre, post) %>%
mutate(node = fct_relevel(node, "LC", "NT", "VU", "EN", "CR"),
next_node = fct_relevel(next_node, "DD", "LC", "VU", "EN", "CR")) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_alluvial() +
scale_fill_manual(values = c("DD" = "#7C7C7C", "LC" = "#20AB5F", "NT" = "#3EFF00", "VU" = "#FBFF00", "EN" = "#FFBD00", "CR" = "#FF0C00"))
EDIT: For your new data the previous approach I posted still works. You need to add the additional level ("NE") in the factor releveling for the pre timepoint and as a new color (blue in this example). What error do you get with this data?
library(tidyverse)
library(ggsankey)
db <- read.table(text = "pre post freq
NE NE 0
NE DD 2
NE LC 5
NE NT 2
NE VU 3
NE EN 5
NE CR 1
DD NE 0
DD DD 3
DD LC 37
DD NT 10
DD VU 14
DD EN 3
DD CR 3
LC NE 0
LC DD 0
LC LC 18
LC NT 2
LC VU 1
LC EN 2
LC CR 0
NT NE 0
NT DD 1
NT LC 3
NT NT 8
NT VU 13
NT EN 5
NT CR 1
VU NE 0
VU DD 0
VU LC 1
VU NT 0
VU VU 7
VU EN 8
VU CR 3
EN NE 0
EN DD 0
EN LC 0
EN NT 0
EN VU 0
EN EN 0
EN CR 2
CR NE 0
CR DD 0
CR LC 1
CR NT 0
CR VU 0
CR EN 0
CR CR 2
", header=T)
db %>%
uncount(freq) %>%
make_long(pre, post) %>%
mutate(node = fct_relevel(node,"DD", "LC", "NT","NE", "VU", "EN", "CR"),
next_node = fct_relevel(next_node, "DD", "LC", "NT", "VU", "EN", "CR")) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_alluvial() +
scale_fill_manual(values = c("DD" = "#7C7C7C", "LC" = "#20AB5F", "NT" = "#3EFF00", "VU" = "#FBFF00", "EN" = "#FFBD00", "CR" = "#FF0C00", "NE" ="blue"))
Upvotes: 2