Reputation: 530
I am looking for an approach to train a dynamic bayesian network (DBN), using the package bnstruct, for a special case where data is collected from similar events. Being so, 1) I would like to train my DBN feeding it with one event per time.
As in the real case the number of events, rows and columns are big, 2) it would be better if some parallel processing could be implemented to improve perfomance.
A dummy code is provided below, where all data must be fed at once, disregarding event boundaries.
library(bnstruct)
numEvents <- 40
numRows <- 5
numCols <- 3
mat <- matrix(data = rnorm(numEvents * numRows * numCols), ncol = numCols)
varNames <- paste0("var", 1:numCols)
colnames(mat) <- varNames
dataset <- BNDataset(data = mat, discreteness = rep(F, ncol(mat)), variables = varNames, node.sizes = rep(3, ncol(mat)))
dbn <- learn.dynamic.network(dataset, num.time.steps = numCols)
Thanks.
Upvotes: 2
Views: 1706
Reputation: 231
The data you are generating is treated in bnstruct as a DBN with 3 layers, each consisting of a single node. The right way of treating a dataset as a sequence of events is to consider variable X
in event i
as a different variable from the same variable X
in event j
, as learn.dynamic.network
is just a proxy for learn.network
with an implicit layering. That is, your dataset doesn't have to be constructed by adding rows, but by adding columns.
Section 4.1.2 of the vignette has an explanation of how to learn a DBN.
The right way of constructing and using a dataset in your example is
mat <- matrix(data = rnorm(numEvents * numRows * numCols), ncol = numCols * numEvents)
varNames <- rep(paste0("var", 1:numCols), numEvents)
colnames(mat) <- varNames
dataset <- BNDataset(data = mat, discreteness = rep(F, ncol(mat)), variables = varNames, node.sizes = rep(3, ncol(mat)))
dbn <- learn.dynamic.network(dataset, num.time.steps = numEvents)
dbn
will have 120 effective nodes, divided in 40 layers.
Coming to the first question: one idea is to provide an initial network as starting point for the successive time steps. Assuming the dataset at time step t+1
is obtained by adding new columns to the dataset used at time step t
, you have to manually adapt the BN
object to represent the dataset.
From the package vignette:
It is also possible to provide an initial network as starting point for the structure search. This can be done using the
initial.network
argument, which accepts three kinds of inputs:
- a
BN
object (with a structure);- a
matrix
containing the adjacency matrix representing the structure of a network;- the string
random.chain
for starting from a randomly sampled chain-like network.
The simplest option is probably to keep an expand the DAG with 0
s at every augmentation, to have a network with more nodes, and no edges going to the new nodes, and to use that new DAG as starting point. In your example:
library(bnstruct)
numEvents <- 40
numRows <- 5
numCols <- 3
mat <- matrix(data = rnorm(numRows * numCols), ncol = numCols)
varNames <- paste0("var", 1:numCols)
colnames(mat) <- varNames
dataset <- BNDataset(data = mat,
discreteness = rep(F, ncol(mat)),
variables = varNames,
node.sizes = rep(3, ncol(mat)))
dbn <- learn.network(dataset)
for (event in 2:numEvents) {
# collect new data
new.mat <- matrix(data = rnorm(numRows * numCols), ncol = numCols)
colnames(new.mat) <- paste0(varNames, "_", event)
mat <- cbind(mat, new.mat)
dataset <- BNDataset(data = mat,
discreteness = rep(F, ncol(mat)),
variables = colnames(mat),
node.sizes = rep(3, ncol(mat)))
# expand structure of the DBN, adding the nodes relative to the new event
dbn.dag <- dag(dbn)
n.nodes <- ncol(dbn.dag)
new.dag <- matrix(0, nrow=ncol(mat), ncol=ncol(mat))
new.dag[1:n.nodes, 1:n.nodes] <- dbn.dag
# learn
dbn <- learn.dynamic.network(dataset,
initial.network = new.dag,
num.time.steps = event)
}
This will, however, re-learn the whole DBN every time. If edges can go only to the immediate following layer you can trim the search space by providing a layer.struct
parameter, or by learning using two events at a time and manually building the larger DBN.
For the second question, bnstruct at the moment does not provide parallel processing.
Upvotes: 4