Reputation: 153
I am trying to develop a sankey chart to visualize a customer journey on a website. My data has two fields: Session_ID
and Page_Name
. I set a limit to page depth to have a maximum of 6 pages per session.
I was able to create the nodes, but not able to create links. Links has to be of the form (source, target, frequency). Below is my data structure:
test_data = data.frame(session = rep(1:4, each = 4),
page = c("a","b","c","d", "a","c","d","e","a","b","d","c","a","d","e","f"))
This should be the final data:
a,b,2
b,c,1
c,d,2
a,c,1
d,e,2
b,d,1
d,c,1
a,d,1
d,f,1
Upvotes: 0
Views: 169
Reputation: 60130
You can do this using dplyr
- since the pages are in order of visits, you can use lead()
to get the next page:
library(dplyr)
test_data %>%
group_by(session) %>%
mutate(next_page = lead(page)) %>%
ungroup() %>%
count(page, next_page) %>%
filter(! is.na(next_page))
Upvotes: 2