Reputation: 53
I have 2 questions on bayesian network with bnlearn package in R.
library(parallel)
cl = makeCluster(4)
set.seed(1)
b1 = boot.strength(data = learning.test, R = 5, algorithm = "hc", cluster = cl, algorithm.args = list( score="bde", iss = 60 , restart=10, perturb = 5 ))
cl = makeCluster(4)
set.seed(1)
b2 = boot.strength(data = learning.test, R = 5, algorithm = "hc", cluster = cl, algorithm.args = list( score="bde", iss = 60 , restart=10, perturb = 5 ))
all.equal(b1, b2)
[1] "Attributes: < Component “threshold”: Mean relative difference: 0.5 >"
[2] "Component “strength”: Mean relative difference: 1.166667"
[3] "Component “direction”: Mean relative difference: 0.5294118"
I am using the boot.strength function in bnlearn package to create bootstraps and create multiple bayesian networks to get the arc strength and direction. Each time I run, I get different results from this boot.strength function. I could not find any seed parameter for this function to get reproducible results. Even if I do set.seed(1) before the line with boot.strength function, it gives different results. Kindly help me to get reproducible results with this function. Reproducible code for this is shown above
I first created one network(version1) after applying some thresholds on arc strength and direction in the output file from boot.strength function which was run without any blacklists. Then I created another network (version2) in the same manner using same thresholds for strength and direction but with some arcs blacklisted. Logically I would expect the version 2 network to have lesser no. of arcs than that in version 1 network. In other words, I have version 1 and then I apply some constraints in the form of blacklists and get version 2. Since some arcs are blacklisted in version 2, I would expect version 2 to have lesser no. of arcs. But when I repeated this experiment (having these 2 versions) multiple times, I always find that version 2 has more no. of arcs than version 1. This is very systematic. I found this issue to be faced by multiple people. Any suggestion on what could cause this anomaly would be greatly helpful. Reproducible code for this is shown below. If the below code is run mutliple times by changing thresholds for strength or changing other parameters like restart, perturb or R, you will find that in all such experiments version 2 (i.e. b2) will have more no. of arcs than version 1 (i.e. b1) systematically.
set.seed(1)
b1 = boot.strength(data = learning.test, R = 5, algorithm = "hc", algorithm.args = list( score="bde", iss = 1 , restart=10, perturb = 5 ))
badarc = expand.grid(c("A","F"), c("B","D"))
colnames(badarc) = c("from", "to")
set.seed(1)
b2 = boot.strength(data = learning.test, R = 5, algorithm = "hc", algorithm.args = list( score="bde", iss = 1 , restart=10, perturb = 5, blacklist = badarc ))
nrow(b1[b1$strength>0.8 & b1$direction>0.5,]) # no. of arcs in version 1
nrow(b2[b2$strength>0.8 & b2$direction>0.5,]) # no. of arcs in version 2
Upvotes: 1
Views: 310