How can I determine robust slice parameters for dynamic network renderings in R using ndtv?

Question

I strive to produce a visual dynamic animation of timestamped transactions, where each transaction represents a contribution of a person to an artifact/file. To this end, I am using the R packages networkDynamic, network and ndtv.

The transactions have (in contrast to the examples in the networkDynamic package vignette) "real" timestamps. I want to wrap the rendering process inside a function that

starts rendering at the beginning of a "natural time frame" such as a day or a week (which most probably is not the timestamp of the first event)
renders "natural" labels to the players timeline instead of integers
uses "natural" slices such as a week/month/year based on the input data

I think I have managed to make the first slice start at the beginning of the week of the first event using lubridates floor_date. I have not looked into the last issue yet (labelling), because unfortunately, I have troubles to determine proper slicing parameters for my data set.

Please find below a reproducible example for RStudio. The example includes three lists named slice.par, one that does work, and two that don't. Simply hardcoding a parameter configuration that (only) works with the concrete example is not my goal, firstly because my real data set is much bigger (and therefore 'playing around' with the parameters costs much time) and secondly because I would like to have a function that works with many different data sets.

if (!require("pacman")) install.packages("pacman")
library("pacman")
pacman::p_load(network, networkDynamic, ndtv, lubridate)

UtilNumericAsDate <- function(nuUnixTimestamp) {
    return(as.POSIXct(nuUnixTimestamp, origin = "1970-01-01 00:00.00 UTC", tz = "UTC"))
}

UtilDateAsNumeric <- function(oTimestamp) {
    return(as.numeric(as.POSIXct(oTimestamp)))
}

stTransac <- "
'contributorId', 'artifactId', 'weight', 'instantId'
'A', 'a1', '1', '2003-06-01 23:09:40'
'A', 'a2', '1', '2004-02-27 11:48:41'
'A', 'a1', '2', '2006-06-25 20:36:49'
'A', 'a3', '1', '2007-01-28 00:35:31'
'A', 'a3', '2', '2007-04-25 16:03:57'
'A', 'a3', '3', '2007-07-19 19:43:49'
'B', 'a1', '1', '2008-02-06 12:37:56'
'C', 'a3', '1', '2008-04-07 02:27:36'
'C', 'a2', '1', '2008-06-01 02:15:35'
'C', 'a2', '2', '2008-10-05 02:32:45'
'B', 'a1', '2', '2009-06-22 01:57:45'
'C', 'a4', '1', '2009-09-15 02:56:33'
'C', 'a5', '1', '2010-06-30 19:42:25'
'C', 'a6', '1', '2011-06-12 23:58:17'
'B', 'a3', '1', '2013-08-30 19:34:28'
'C', 'a1', '1', '2014-10-23 20:49:54'
'C', 'a1', '2', '2014-10-24 16:46:07'
'A', 'a2', '2', '2015-09-26 16:58:17'
'A', 'a7', '1', '2015-10-04 17:40:12'
'A', 'a8', '1', '2015-12-02 10:55:47'
"

dfTransac <- read.csv(text = stTransac, sep = "," , quote = '\'' , strip.white = TRUE, stringsAsFactors = FALSE)

dfEdges <- unique(dfTransac[,1:2])
veUniqueContributors   <- unique(dfEdges[[1]])
veUniqueArtifacts      <- unique(dfEdges[[2]])
nuNrUniqueContributors <- length(veUniqueContributors)
nuNrUniqueArtifacts    <- length(veUniqueArtifacts)

net <- network.initialize(0, directed = TRUE, bipartite = length(veUniqueContributors))

add.vertices.networkDynamic(net, nuNrUniqueContributors, vertex.pid = veUniqueContributors)
add.vertices.networkDynamic(net, nuNrUniqueArtifacts, vertex.pid = veUniqueArtifacts)

net %v% "vertex.names" <- c(veUniqueContributors, veUniqueArtifacts)
net %v% "vertex.type"  <- c(rep("p", length(veUniqueContributors)), rep("a", length(veUniqueArtifacts)))
net %v% "vertex.col"   <- c(rep("blue", length(veUniqueContributors)), rep("gray", length(veUniqueArtifacts)))
net %v% "vertex.sides" <- c(rep(8, length(veUniqueContributors)), rep(4, length(veUniqueArtifacts)))
net %v% "vertex.rot"   <- c(rep(0, length(veUniqueContributors)), rep(45, length(veUniqueArtifacts)))
net %v% "vertex.lwd"   <- c(rep(1, length(veUniqueContributors)), rep(0, length(veUniqueArtifacts)))
net %v% "vertex.cex"   <- c(rep(2, length(veUniqueContributors)), rep(1, length(veUniqueArtifacts)))
set.network.attribute(net,'vertex.pid','vertex.names')
set.network.attribute(net,'edge.pid','edge.names')

add.edges.networkDynamic(net,
                         tail = get.vertex.id(net, dfEdges[[1]]),
                         head = get.vertex.id(net, dfEdges[[2]]),
                         edge.pid = paste0(dfEdges[[1]], "->", dfEdges[[2]]))

activate.edges(net,
               e = get.edge.id(net, paste0(dfTransac[["contributorId"]], "->", dfTransac[["artifactId"]])),
               at = UtilDateAsNumeric(dfTransac$instantId))

activate.edge.attribute(net,
                        prefix = "weight",
                        value = dfTransac$weight,
                        e = get.edge.id(net, paste0(dfTransac[["contributorId"]], "->", dfTransac[["artifactId"]])),
                        at = UtilDateAsNumeric(dfTransac$instantId))

reconcile.vertex.activity(net = net, mode = "encompass.edges", edge.active.default = FALSE)

nuStart      <- range(get.change.times(net, ignore.inf = FALSE))[1]
nuEnd        <- range(get.change.times(net, ignore.inf = FALSE))[2]

nuWeekStart  <- UtilDateAsNumeric(floor_date(UtilNumericAsDate(nuStart), "week"))
nuWeekEnd    <- UtilDateAsNumeric(ceiling_date(UtilNumericAsDate(nuEnd), "week"))

# This doesn't work: "Monthly" slices, 5 year aggregation
# Error: Attribute 'vertex.sides' had illegal missing values for vertex.sides or was not present in plot.network.default.
slice.par <- list(start = nuWeekStart,
                          end = nuWeekEnd,
                          interval = 1*60*60*24*7*4.5,
                          aggregate.dur = 1*60*60*24*7*52*5,
                          rule = "any")

# This doesn't work either: "Bimonthly" slices, "Bimonthly" aggregation
# Error: Attribute 'weight' had illegal missing values for edge.lwd or was not present in plot.network.default.
slice.par <- list(start = nuWeekStart,
                          end = nuWeekEnd,
                          interval = 1*60*60*24*7*4.5*2,
                          aggregate.dur = 1*60*60*24*7*4.5*2,
                          rule = "any")

# This works: "Bimonthly" slices, 5 year aggregation
slice.par <- list(start = nuWeekStart,
                          end = nuWeekEnd,
                          interval = 1*60*60*24*7*4.5*2,
                          aggregate.dur = 1*60*60*24*7*52*5,
                          rule = "any")

compute.animation(net, animation.mode = "kamadakawai", slice.par = slice.par, default.dist = 10)

render.d3movie(net,
               slice.par = slice.par,
               displaylabels = TRUE,
               output.mode = "htmlWidget",
               usearrows = TRUE,
               vertex.col = 'vertex.col',
               vertex.sides = 'vertex.sides',
               vertex.cex = 'vertex.cex',
               vertex.rot = 'vertex.rot',
               edge.lwd = 'weight',
               render.par = list(tween.frames = 10, show.time = TRUE))

How can I derive proper slicing parameters from the data set so that the rendering process does not choke on individual slices that miss attributes or edges without simply increasing the aggregation duration?

skyebend · Accepted Answer

As you have already established, there is a bug in the render.d3movie function. It is trying to look up values for an 'empty' slice (a time range that includes no active vertices), see the bug report at Github. (I'm not actually able to reproduce the error with your code above, but it is definitely a bug, thanks for reporting)

How can I derive proper slicing parameters from the data set so that the rendering process does not choke on individual slices that miss attributes or edges without simply increasing the aggregation duration?

Until the bug is fixed (hopefully soon), you could

a) use render.animation instead

b) choose your slice parameters to ensure that there are no slice where the network has no active vertices. You can use the timeline function to see where slices will land. For example, to show the activity spells for nodes (blue) and edges (purple) along with the slice bins (vertical gray bars):

timeline(net,slice.par=slice.par,main='timeline plot of activity spells')

c) the best solution might be to tweak the vertex activity to ensure that there is always a vertex active in each time slice. In this case there are some vertices that were assigned very short durations by reconcile.vertex.activity because they only encompass edges of very short duration. Using a different rule might avoid that, or setting vertices to be always active once they appear (if that makes sense for your data).

Some other notes:

You probably also need to set the slice.par$rule value to earliest instead of any so that when it encounters multiple possible values when binning the dynamic weight attribute on the edges it will know which one to choose.

BTW, I think there may be a more compact way to construct your network using the networkDynamic utility function and passing in stTransac, and it will probably be faster at loading a large data set.

How can I determine robust slice parameters for dynamic network renderings in R using ndtv?

Answers (1)

Related Questions