R: How to customize Sankey plot in ggplotly?

Question

I have data of sales by year and model, which is visualized via Sankey chart. Now I am struggling to handle 2 issue:

Firstly I need to set model B always on the bottom of chaty regardless its value over the years.
When I re-visualize ggplot via ggplotly hover does not show sales or years

code:

df <- data.frame (model  = c("A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J"),
 Year = c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020),
                  sales = c(450,678,456,344,984,456,234,244,655,789,234,567,234,567,232,900,1005,1900,450,345,567,235,456,345,144,333,555,777,111,444,222,223,445,776,331,788,980,1003,456,434,345,2222,3456,456,678,8911,4560,4567,4566,5555,6666,7777,8888,1233,1255,5677,3411,2344,6122,4533))

install.packages("remotes")
#remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)
library(tidyverse)

plot <- ggplot(df, aes(x = Year,
               node = model,
               fill = model,
               value = sales)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) 

  
  
ggplotly(plot)

Kat · Accepted Answer

I'm absolutely certain that there is a better way, but it took me a while to get it working. I think this is what you were looking for.

I started with the ggplot and ggplotly objects that you have here. The primary purpose of this initial plot is to capture the colors. (I could have captured them a few different ways, but this was already done for me in your plot.)

Update ** I've modified the two elements you requested

library(ggsankey)
library(tidyverse)
library(plotly)

# df from the question is unchanged

# visualize the original
(plot <- ggplot(df, 
              aes(Year, node = model, fill = model, value = sales)) + 
    geom_sankey_bump(space = 0, type = "alluvial", 
                     color = "transparent", smooth = 15) +
    scale_fill_viridis_d(option = "A", alpha = .8) +
    theme_sankey_bump(base_size = 16)) 

ggplotly(plot) -> plp 
plp

#-------- colors --------
# collect the 10 colors
cols <- map_dfr(1:10, function(k){
  nm <- plp$x$data[[k]]$name
  filler <- plp$x$data[[k]]$fillcolor
  c(nm = nm, filler = filler)
})

Then I divided the contents of the B model into 10 groups to ensure it was always the smallest bump. This allowed me to collect the stacked values for all of the other models, which is needed to push B to the bottom.

#-------------- splitting B -------------
df1 <- df %>% filter(model != "B") %>% 
  arrange(Year, sales)

df2 <- df %>% filter(model == "B") %>% # this gets used further down
  arrange(Year)

# split B into 10 groups - keep on the bottom, then join the groups
# make the groups
ng <- vector(length = 10)
invisible(
  map(1:10,
      function(i) {
        ng[i] <<- rep("B", i) %>% paste0(collapse = "")
      })
)
# add values for these groups by year
df4 <- data.frame(Year = rep(unique(df$Year), each = 10),
                  model = rep(ng, length(unique(df$Year))),
                  sales = rep(df2$sales/10, each = 10))
df5 <- rbind(df1, df4)

Recreate the Sankey bump with 10 subsections of model B. Everything that follows works with this plot.

#-------------- plotly after dividing B -------------
(nplt <- ggplot(df5, aes(x = Year, node = model, fill = model, value = sales)) +
   geom_sankey_bump(space = 0, type = "alluvial", 
                    color = "transparent", smooth = 15) +
   scale_fill_viridis_d(option = "A", alpha = .8) +
   theme_sankey_bump(base_size = 16))

ggplotly(nplt) -> plt
plt

Create a Sankey bump with JUST B, to capture data that represents model B at the bottom. Use this data to substitute all of the traces that represent B in the object plt. The colors get fixed here, as well. (The original 10 colors from the first plot.) Lastly, the hoverinfo gets removed. That will get fixed next.

#-------------- get values for B at the bottom -------------
df %>% filter(model == "B") %>% 
  ggplot(aes(x = Year,
             node = model,
             fill = model,
             value = sales)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) -> bplt
ggplotly(bplt) -> bplotly
bplotly

#------- take divided B and remove all but one trace for B --------
# xx <- plt$x$data
# plt$x$data <- xx[c(1:2, 12:length(xx))] # keep only one B trace

#---------------- adjustments to plt's build --------------------
# change out data for the B trace, add the right colors
wh <- vector(length = 0)
invisible(
  map(1:length(plt$x$data),
      function(j) {
        nm <- plt$x$data[[j]]$name
        plt$x$data[[j]]$hoverinfo <<- "none"
        plt$x$data[[j]]$fillcolor <<- unlist(cols[cols$nm == nm, "filler"], 
                                             use.names = F)
        if(str_detect(nm, "^B$")){
          plt$x$data[[j]]$x <<- bplotly$x$data[[1]]$x
          plt$x$data[[j]]$y <<- bplotly$x$data[[1]]$y
        }
        if(str_detect(nm, "BB")) {
          wh[length(wh) + 1] <<- j # list of unnecessary traces (extra B groups)
        }
      })
)

#----- take divided B and remove all but one trace for B ------
plt$x$data <- plt$x$data[-c(wh)]         # <------ forget this line when updated last time

# visualize Sankey bump with B at the bottom
plt

The Plotly object is basically 10 globs of color, there is no separation between years in the background. So if you add a tooltip to this as it is, there can be only one...

To get the tooltips you're looking for, I created another trace (well, 10, actually—1 for each model). In order to get the right values (because the sales data isn't in the 50K range), I used the data in plt to create a new data frame.

#--------------- collect values for hovertext positions ----------
x <- plt$x$data[[1]]$x
inds <- which(x %in% 2015:2020, arr.ind = T)
yrs <- x[inds]

tellMe <- invisible(
  map(1:length(plt$x$data),
      function(m) {
        y <- plt$x$data[[m]]$y
        y[inds]
      }) %>% setNames(sort(unique(df$model))) %>% # changed from LETTERS[1:10] 
    as.data.frame() %>% 
    mutate(yr = yrs %>% as.integer()) %>% 
    pivot_longer(names_to = "model", values_to = "sales", 
                 cols = sort(unique(df$model))) %>% 
    distinct() %>% 
    group_by(yr, model) %>% 
    summarise(val = mean(sales)) %>% 
    left_join(df, by = c("yr" = "Year", "model" = "model")) %>% 
    as.data.frame() # drop groups
)

#-------------- create data trace for hovertext --------------
plot_ly(tellMe, x = ~yr, y = ~val, split = ~model, 
        customdata = ~sales, text = ~model,
        line = list(width = .01, shape = "spline", smoothing = 1.3),
        hovertemplate = "Year: %{x}
Model: %{text}
Sales: %{customdata}",
        type = "scatter", mode = "lines", showlegend = F) -> pp2
pp2

If you look at the plot here, it looks blank. That's because of how small the lines are. This is intentional. You don't want lines on your graph.

Fix the colors, so that the hoverlabel background colors match the legend colors.

# change colors to match sankey
pp2 <- plotly_build(pp2)
invisible(
  map(1:10,
      function(z) {
        nm <- pp2$x$data[[z]]$name
        # collect and assign the color
        cr <- unlist(cols[cols$nm == nm, "filler"], use.names = F)
        pp2$x$data[[z]]$line$color <<- cr
      })
)

Using subplot here didn't work. Plotly gave me an error when I tried adding a trace, whether all at once or even one for each model. So I forced the traces together.

#-------------- consolidate the traces (subplot won't work) -----------
# collect data one more time!
dx <- plt$x$data
yx <- pp2$x$data
yx <- append(yx, dx) # put plt on top

# replace data 
plt$x$data <- yx

# lines are small, increase the distance searched for matches
plt %>% layout(hoverdistance = 40)

The final product:

R: How to customize Sankey plot in ggplotly?

Answers (1)

Update ** I've modified the two elements you requested

Related Questions