RgrNormand
RgrNormand

Reputation: 530

Dynamic Bayesian Network - multivariate - repetitive events - bnstruct R Package

I am looking for an approach to train a dynamic bayesian network (DBN), using the package bnstruct, for a special case where data is collected from similar events. Being so, 1) I would like to train my DBN feeding it with one event per time.

As in the real case the number of events, rows and columns are big, 2) it would be better if some parallel processing could be implemented to improve perfomance.

A dummy code is provided below, where all data must be fed at once, disregarding event boundaries.

library(bnstruct)

numEvents <- 40
numRows <- 5
numCols <- 3

mat <- matrix(data = rnorm(numEvents * numRows * numCols), ncol = numCols)
varNames <- paste0("var", 1:numCols)
colnames(mat) <- varNames

dataset <- BNDataset(data = mat,  discreteness = rep(F, ncol(mat)), variables = varNames, node.sizes = rep(3, ncol(mat)))

dbn <- learn.dynamic.network(dataset, num.time.steps = numCols)

Thanks.

Upvotes: 2

Views: 1706

Answers (1)

Alberto Franzin
Alberto Franzin

Reputation: 231

The data you are generating is treated in bnstruct as a DBN with 3 layers, each consisting of a single node. The right way of treating a dataset as a sequence of events is to consider variable X in event i as a different variable from the same variable X in event j, as learn.dynamic.network is just a proxy for learn.network with an implicit layering. That is, your dataset doesn't have to be constructed by adding rows, but by adding columns. Section 4.1.2 of the vignette has an explanation of how to learn a DBN.

The right way of constructing and using a dataset in your example is

mat <- matrix(data = rnorm(numEvents * numRows * numCols), ncol = numCols * numEvents)
varNames <- rep(paste0("var", 1:numCols), numEvents)
colnames(mat) <- varNames

dataset <- BNDataset(data = mat,  discreteness = rep(F, ncol(mat)), variables = varNames, node.sizes = rep(3, ncol(mat)))

dbn <- learn.dynamic.network(dataset, num.time.steps = numEvents)

dbn will have 120 effective nodes, divided in 40 layers.

Coming to the first question: one idea is to provide an initial network as starting point for the successive time steps. Assuming the dataset at time step t+1 is obtained by adding new columns to the dataset used at time step t, you have to manually adapt the BN object to represent the dataset.

From the package vignette:

It is also possible to provide an initial network as starting point for the structure search. This can be done using the initial.network argument, which accepts three kinds of inputs:

  • a BN object (with a structure);
  • a matrix containing the adjacency matrix representing the structure of a network;
  • the string random.chain for starting from a randomly sampled chain-like network.

The simplest option is probably to keep an expand the DAG with 0s at every augmentation, to have a network with more nodes, and no edges going to the new nodes, and to use that new DAG as starting point. In your example:

library(bnstruct)

numEvents <- 40
numRows <- 5
numCols <- 3

mat <- matrix(data = rnorm(numRows * numCols), ncol = numCols)
varNames <- paste0("var", 1:numCols)
colnames(mat) <- varNames

dataset <- BNDataset(data = mat, 
           discreteness = rep(F, ncol(mat)),
           variables = varNames,
           node.sizes = rep(3, ncol(mat)))

dbn <- learn.network(dataset)

for (event in 2:numEvents) {

    # collect new data
    new.mat <- matrix(data = rnorm(numRows * numCols), ncol = numCols)
    colnames(new.mat) <- paste0(varNames, "_", event)
    mat <- cbind(mat, new.mat)
    dataset <- BNDataset(data = mat,
                         discreteness = rep(F, ncol(mat)),
                         variables = colnames(mat),
                         node.sizes = rep(3, ncol(mat)))

    # expand structure of the DBN, adding the nodes relative to the new event
    dbn.dag <- dag(dbn)
    n.nodes <- ncol(dbn.dag)
    new.dag <- matrix(0, nrow=ncol(mat), ncol=ncol(mat))
    new.dag[1:n.nodes, 1:n.nodes] <- dbn.dag

    # learn
    dbn <- learn.dynamic.network(dataset,
                                 initial.network = new.dag,
                                 num.time.steps = event)

}

This will, however, re-learn the whole DBN every time. If edges can go only to the immediate following layer you can trim the search space by providing a layer.struct parameter, or by learning using two events at a time and manually building the larger DBN.

For the second question, bnstruct at the moment does not provide parallel processing.

Upvotes: 4

Related Questions