Reputation:
I have data of sales by year and model, which is visualized via Sankey chart. Now I am struggling to handle 2 issue:
code:
df <- data.frame (model = c("A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J"),
Year = c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020),
sales = c(450,678,456,344,984,456,234,244,655,789,234,567,234,567,232,900,1005,1900,450,345,567,235,456,345,144,333,555,777,111,444,222,223,445,776,331,788,980,1003,456,434,345,2222,3456,456,678,8911,4560,4567,4566,5555,6666,7777,8888,1233,1255,5677,3411,2344,6122,4533))
install.packages("remotes")
#remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)
library(tidyverse)
plot <- ggplot(df, aes(x = Year,
node = model,
fill = model,
value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16)
ggplotly(plot)
Upvotes: 1
Views: 1064
Reputation: 18744
I'm absolutely certain that there is a better way, but it took me a while to get it working. I think this is what you were looking for.
I started with the ggplot
and ggplotly
objects that you have here. The primary purpose of this initial plot is to capture the colors. (I could have captured them a few different ways, but this was already done for me in your plot.)
library(ggsankey)
library(tidyverse)
library(plotly)
# df from the question is unchanged
# visualize the original
(plot <- ggplot(df,
aes(Year, node = model, fill = model, value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial",
color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16))
ggplotly(plot) -> plp
plp
#-------- colors --------
# collect the 10 colors
cols <- map_dfr(1:10, function(k){
nm <- plp$x$data[[k]]$name
filler <- plp$x$data[[k]]$fillcolor
c(nm = nm, filler = filler)
})
Then I divided the contents of the B model into 10 groups to ensure it was always the smallest bump. This allowed me to collect the stacked values for all of the other models, which is needed to push B to the bottom.
#-------------- splitting B -------------
df1 <- df %>% filter(model != "B") %>%
arrange(Year, sales)
df2 <- df %>% filter(model == "B") %>% # this gets used further down
arrange(Year)
# split B into 10 groups - keep on the bottom, then join the groups
# make the groups
ng <- vector(length = 10)
invisible(
map(1:10,
function(i) {
ng[i] <<- rep("B", i) %>% paste0(collapse = "")
})
)
# add values for these groups by year
df4 <- data.frame(Year = rep(unique(df$Year), each = 10),
model = rep(ng, length(unique(df$Year))),
sales = rep(df2$sales/10, each = 10))
df5 <- rbind(df1, df4)
Recreate the Sankey bump with 10 subsections of model B. Everything that follows works with this plot.
#-------------- plotly after dividing B -------------
(nplt <- ggplot(df5, aes(x = Year, node = model, fill = model, value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial",
color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16))
ggplotly(nplt) -> plt
plt
Create a Sankey bump with JUST B, to capture data that represents model B at the bottom. Use this data to substitute all of the traces that represent B in the object plt
. The colors get fixed here, as well. (The original 10 colors from the first plot.) Lastly, the hoverinfo
gets removed. That will get fixed next.
#-------------- get values for B at the bottom -------------
df %>% filter(model == "B") %>%
ggplot(aes(x = Year,
node = model,
fill = model,
value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16) -> bplt
ggplotly(bplt) -> bplotly
bplotly
#------- take divided B and remove all but one trace for B --------
# xx <- plt$x$data
# plt$x$data <- xx[c(1:2, 12:length(xx))] # keep only one B trace
#---------------- adjustments to plt's build --------------------
# change out data for the B trace, add the right colors
wh <- vector(length = 0)
invisible(
map(1:length(plt$x$data),
function(j) {
nm <- plt$x$data[[j]]$name
plt$x$data[[j]]$hoverinfo <<- "none"
plt$x$data[[j]]$fillcolor <<- unlist(cols[cols$nm == nm, "filler"],
use.names = F)
if(str_detect(nm, "^B$")){
plt$x$data[[j]]$x <<- bplotly$x$data[[1]]$x
plt$x$data[[j]]$y <<- bplotly$x$data[[1]]$y
}
if(str_detect(nm, "BB")) {
wh[length(wh) + 1] <<- j # list of unnecessary traces (extra B groups)
}
})
)
#----- take divided B and remove all but one trace for B ------
plt$x$data <- plt$x$data[-c(wh)] # <------ forget this line when updated last time
# visualize Sankey bump with B at the bottom
plt
The Plotly object is basically 10 globs of color, there is no separation between years in the background. So if you add a tooltip to this as it is, there can be only one...
To get the tooltips you're looking for, I created another trace (well, 10, actually—1 for each model). In order to get the right values (because the sales data isn't in the 50K range), I used the data in plt
to create a new data frame.
#--------------- collect values for hovertext positions ----------
x <- plt$x$data[[1]]$x
inds <- which(x %in% 2015:2020, arr.ind = T)
yrs <- x[inds]
tellMe <- invisible(
map(1:length(plt$x$data),
function(m) {
y <- plt$x$data[[m]]$y
y[inds]
}) %>% setNames(sort(unique(df$model))) %>% # changed from LETTERS[1:10]
as.data.frame() %>%
mutate(yr = yrs %>% as.integer()) %>%
pivot_longer(names_to = "model", values_to = "sales",
cols = sort(unique(df$model))) %>%
distinct() %>%
group_by(yr, model) %>%
summarise(val = mean(sales)) %>%
left_join(df, by = c("yr" = "Year", "model" = "model")) %>%
as.data.frame() # drop groups
)
#-------------- create data trace for hovertext --------------
plot_ly(tellMe, x = ~yr, y = ~val, split = ~model,
customdata = ~sales, text = ~model,
line = list(width = .01, shape = "spline", smoothing = 1.3),
hovertemplate = "Year: %{x}<br>Model: %{text}<br>Sales: %{customdata}<extra></extra>",
type = "scatter", mode = "lines", showlegend = F) -> pp2
pp2
If you look at the plot here, it looks blank. That's because of how small the lines are. This is intentional. You don't want lines on your graph.
Fix the colors, so that the hoverlabel
background colors match the legend colors.
# change colors to match sankey
pp2 <- plotly_build(pp2)
invisible(
map(1:10,
function(z) {
nm <- pp2$x$data[[z]]$name
# collect and assign the color
cr <- unlist(cols[cols$nm == nm, "filler"], use.names = F)
pp2$x$data[[z]]$line$color <<- cr
})
)
Using subplot
here didn't work. Plotly gave me an error when I tried adding a trace, whether all at once or even one for each model. So I forced the traces together.
#-------------- consolidate the traces (subplot won't work) -----------
# collect data one more time!
dx <- plt$x$data
yx <- pp2$x$data
yx <- append(yx, dx) # put plt on top
# replace data
plt$x$data <- yx
# lines are small, increase the distance searched for matches
plt %>% layout(hoverdistance = 40)
The final product:
Upvotes: 1