anabelbs
anabelbs

Reputation: 33

Sankey Diagram with Network package in R

I am trying to create a simple Sankey diagram following the instructions of R Graph Gallery: https://www.r-graph-gallery.com/322-custom-colours-in-sankey-diagram.html. I have a dataset with two obvs per ID. For each period I know if someone is poor or not. The dataset looks like this:

ID   YEAR   POVERTY
1    2018      0
1    2019      1
2    2018      1
2    2019      1
3    2018      0
3    2019      1
4    2018      0
4    2019      0
5    2018      0
5    2018      0

I guess I would need to convert it into a source-target-value table but I don´t understand what "value"is for. Would someone explained to me? How could I move forward with it?

Thank you very much in advance for your help :)

I have used the code that it was provided:

library("dplyr", warn.conflicts = FALSE)
library("networkD3")

diagram <- SUBSET05%>% 
  dplyr::mutate(Poverty = dplyr::if_else(Poverty==1, "poor", "not poor")) %>% 
  dplyr::transmute(id_nmbr, yr_interview, Poverty = paste(Poverty, yr_interview, sep = "_"))

links <- diagram %>%
  tidyr::pivot_wider(names_from = yr_interview, values_from = Poverty) %>% 
  dplyr::rename(source = `2018`, target = `2019`)

nodes <- data.frame(name = unique(c(links$source, links$target))) %>% 
  tidyr::separate(name, into = c("group", "year"), sep = "_", remove = FALSE)

links$id_nmbrsource <- match(links$source, nodes$name)-1 
links$id_nmbrtarget <- match(links$target, nodes$name)-1
links$value <- 10

sn <- sankeyNetwork(Links = links,
                    Nodes = nodes,
                    NodeID = "name",
                    Source = "id_nmbrsource",
                    Target = "id_nmbrtarget",
                    NodeGroup = "group",
                    Value = "value")
sn 

and I am getting the following image:

enter image description here

My dataset has 34034 observations, 17017 for each year. Do I have to change the value column because of this? What could be causing the ugly image?

Upvotes: 0

Views: 197

Answers (1)

giocomai
giocomai

Reputation: 3518

I am not sure I really understand how do you want your output to look like.

Either way, I do not think that "Value" is really important in your case. Each connection has the same importance, so you can set it to any arbitrary value.

If the point is just to show how many moved from poverty to non-poverty, then the starting point should be the idea that you actually have four groups: "poor" and "non-poor" in the two time periods.

The result would be something like this:

library("dplyr", warn.conflicts = FALSE)
library("networkD3")


df <- tibble::tribble(
  ~ID, ~YEAR, ~POVERTY,
  "1", 2018, 0,
  "1", 2019, 1,
  "2", 2018, 1,
  "2", 2019, 1,
  "3", 2018, 0,
  "3", 2019, 1,
  "4", 2018, 0,
  "4", 2019, 0,
  "5", 2018, 0,
  "5", 2019, 0
) %>% 
  dplyr::mutate(POVERTY = dplyr::if_else(POVERTY==0, "poor", "not poor")) %>% 
  dplyr::transmute(ID, YEAR, POVERTY = paste(POVERTY, YEAR, sep = "_")) 


links <- df %>% 
  tidyr::pivot_wider(names_from = YEAR, values_from = POVERTY) %>% 
  dplyr::rename(source = `2018`, target = `2019`) 

nodes <- data.frame(name = unique(c(links$source, links$target))) %>% 
  tidyr::separate(name, into = c("group", "year"), sep = "_", remove = FALSE)


links$IDsource <- match(links$source, nodes$name)-1 
links$IDtarget <- match(links$target, nodes$name)-1
links$value <- 10


sn <- sankeyNetwork(Links = links,
                    Nodes = nodes,
                    NodeID = "name",
                    Source = "IDsource",
                    Target = "IDtarget",
                    NodeGroup = "group",
                    Value = "value") 

sn

sankey

Upvotes: 1

Related Questions