bogdanCsn
bogdanCsn

Reputation: 1325

Transform sequence of data into JSON for D3.js visualization

I have a data that shows a series of actions (column Actions ) performed by several users (column Id). The order of the data frame is important - it is the order the actions were performed in. For each id, the first action performed is start. Consecutive identical actions are possible (for example, the sequence start -> D -> D -> D is valid ). This is some code to generate data:

set.seed(10)

i <- 0
all_id <- NULL
all_vals <- NULL

while (i < 5) {
  i <- i + 1
  print(i)
  size <- sample(3:5, size = 1)
  tmp_id <- rep(i, times = size + 1)
  tmp_vals <- c("start",sample(LETTERS, size = size)  )

  all_id <- c(all_id, tmp_id)
  all_vals <- c(all_vals, tmp_vals)
}

df <- data.frame(Id = all_id,
                 Action = all_vals)

Goal - transform this data in a JSON nested on multiple levels that will be used in a D3.js visualization (like this). I would like to see a counter for how many times each child appears for their respective parent (an maybe even a percentage out of the total appearances of the parent) - but I hope I can do that myself.

Expected output below - this is generic, not from the data I generated above, and real data will have quite a lot of nested values ( count and percentage are optional at this point in time):

  {
    "action": "start",
    "parent": "null",
    "count": "10",
    "percentage": "100",
    "children": [
      {
        "action": "H",
        "parent": "start",
        "count": "6",
        "percentage": "60",
        "children": [
          {
            "action": "D",
            "parent": "H",
            "count": "5",
            "percentage": "83.3"            
          },
          {
            "action": "B",
            "parent": "H",
            "count": "3",
            "percentage": "50"          
          }
        ]
      },
      {
        "action": "R",
        "parent": "start",
        "count": "4",
        "percentage": "40"
      }
    ]
  }

I know I am supposed to post something I've tried, but I really don't have anything remotely worth of being shown.

Upvotes: 0

Views: 126

Answers (1)

timelyportfolio
timelyportfolio

Reputation: 6579

I have just started writing some R -> d3.js converters in https://github.com/timelyportfolio/d3r that should work well in these type situations. I will work up an example later today with your data.

The internal hierarchy builder in https://github.com/timelyportfolio/sunburstR also might work well here.

I'll add to the answer as I explore both of these paths.

example 1

set.seed(10)

i <- 0
all_id <- NULL
all_vals <- NULL

while (i < 5) {
  i <- i + 1
  print(i)
  size <- sample(3:5, size = 1)
  tmp_id <- rep(i, times = size + 1)
  tmp_vals <- c("start",sample(LETTERS, size = size)  )

  all_id <- c(all_id, tmp_id)
  all_vals <- c(all_vals, tmp_vals)
}

df <- data.frame(Id = all_id,
                 Action = all_vals)

# not sure I completely understand what this is
#  supposed to become but here is a first try

# find position of start
start_pos <- which(df$Action=="start")
# get the sequences
#  surely there is a better way but do this for now
sequences <- paste(
  start_pos+1,
  c(start_pos[-1],nrow(df))-1,
  sep=":"
)
paths <- lapply(
  sequences,
  function(x){
    data.frame(
      t(as.character(df[eval(parse(text=x)),]$Action)),
      stringsAsFactors=FALSE
    )
  }
)
paths_df <- dplyr::bind_rows(paths)

# use d3r
# devtools::install_github("timelyportfolio/d3r")
library(d3r)
d3_nest(paths_df) # if want list, then json=FALSE

# visualize with listviewer
# devtools::install_github("timelyportfolio/listviewer")
listviewer::jsonedit(d3_nest(paths_df))

Upvotes: 1

Related Questions