Reputation: 129
I have been following examples on stack to create Sankey charts using the NetworkD3 package. I would like to set the colour of the links specific to column variables event
. The data that I have has multiple columns:
structure(list(names = c("bell", "john", "andrew", "sam", "bell",
"bell", "andrew"), event = c("Event 1", "Event 2", "Event 3",
"Event 1", "Event 2", "Event 4", "Event 1"), response = c("Yes",
"Yes", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA,
-7L))
> d
names event response
1 bell Event 1 Yes
2 john Event 2 Yes
3 andrew Event 3 No
4 sam Event 1 Yes
5 bell Event 2 No
6 bell Event 4 Yes
7 andrew Event 1 No
Following CJ Yetman's example (thank you very much for the very followable example if you ever read this!) here, data frames for the links and nodes were created:
links <- d %>%
mutate(row = row_number()) %>%
gather('column', 'source', -row) %>%
mutate(column = match(column, names(d))) %>%
group_by(row) %>%
arrange(column) %>%
mutate(target = lead(source)) %>%
ungroup %>%
filter(!is.na(target)) %>%
#index
mutate(source = paste0(source, '_', column)) %>%
mutate(target = paste0(target, '_', column + 1)) %>%
select(source, target)
nodes <- data.frame(
name=c(as.character(links$source), as.character(links$target)) %>%
unique()
)
Then I matched the indices in the nodes
data frame to the source
and target
columns in the links
data frame. In this case, I mutated new columns rather than overwriting it because it is easier for me to read:
link <- links %>%
mutate(IDsource = match(links$source, nodes$name)-1) %>%
mutate(IDtarget = match(links$target, nodes$name)-1) %>%
mutate(value = 1) %>%
#remove index
mutate(source = sub('_[0-9]+$', '', source)) %>%
mutate(target = sub('_[0-9]+$', '', target))
#
nodes$name <- sub('_[0-9]+$', '', nodes$name)
As mentioned, I would like to set the colour of the links across the entire flow of the Sankey diagram by event
. Here specifies creating a new column in links
and nodes
to match the colour scale specified in the d3.scaleOrdinal()
function.
#New column to set colour for nodes
nodes$node.col <- as.factor(c("nod.col")) #set nodes to one colour only
#New column for links
col.index <- d %>%
select(event) %>% unique(); col.index #get events from original dataframe
#mutate new column to set colours for links
links$links.col <- col.index$event[match(links$source, col.index$event)]
links$links.col[is.na(links$links.col)] <- col.index$event[match(links$target[is.na(links$links.col)], col.index$event)]
links$links.col <- as.factor(links$links.col) #change to factors
#d3.scaleOrdinal()
my_color <- 'd3.scaleOrdinal() .domain(["Event 1", "Event 2", "Event 3", "Event 4", "nod.col"]) .range(["red", "blue", "red", "yellow", "gray"])'
However, the links are all red, when each event should have its own colour (across the entire flows).
sankeyNetwork(Links = links, Nodes = nodes, Source = 'IDsource',
Target = 'IDtarget', Value = 'value', NodeID = 'name', colourScale=my_color,
LinkGroup="links.col", NodeGroup="node.col")
Could someone explain what I did wrong? The other issue I have is with manually setting the colours for each event in d3.scaleOrdinal()
. If there an efficient way to do this I need to set the colour for up to 30 or more events?
Upvotes: 1
Views: 1052
Reputation: 124863
To get the right colors use clean names in your color assignment, e.g. replace spaces in your column links.col
and the domain
of d3.scaleOrdinal
with e.g. an underscore. Unfortunately I have only basic knowledge of D3 and JS. So I can't tell you what's exactly the issue. But it works and was the only difference between your code and the example code from the R Graph Gallery:
library(tidyr)
library(dplyr)
library(networkD3)
# Make clean names
links$links.col <- as.factor(gsub(" ", "_", links$links.col)) #change to factors
my_color <- 'd3.scaleOrdinal() .domain(["Event_1", "Event_2", "Event_3", "Event_4", "nod.col"]) .range(["red", "blue", "red", "yellow", "grey"])'
sankeyNetwork(Links = links, Nodes = nodes, Source = 'IDsource',
Target = 'IDtarget', Value = 'value', NodeID = 'name', colourScale=my_color,
LinkGroup="links.col", NodeGroup="node.col")
#> Links is a tbl_df. Converting to a plain data frame.
EDIT Concerning the second part of your question. If you want custom colors for your events then I'm afraid some manual work is necessary to define the palette which assigns colors to events. However, one option to make this more "efficient" would be to first make a lookup table and then create the JS code by concatenating strings like so:
# Make a look up table of events and colors
cols <- data.frame(
domain = c("Event_1", "Event_2", "Event_3", "Event_4"),
color = c("red", "blue", "red", "yellow")
)
cols$domain <- sprintf('"%s"', cols$domain)
cols$color <- sprintf('"%s"', cols$color)
# Make the JS code by glueing strings
my_color <- c('d3.scaleOrdinal().domain([',
paste(c(cols$domain, '"nod.col"'), collapse = ", "),
"]) .range([",
paste(c(cols$color, '"grey"'), collapse = ", "),
"])")
my_color <- paste(my_color, collapse = "")
Upvotes: 2