Reputation: 59
I am using the bnlearn package in R to learn a Bayesian network using constraint-based structure learning algorithms (I tried gs and pc.stable).
I am trying to apply a blacklist to restrict certain directed arcs, but the blacklist does not seem to be enforced as expected.
R version 4.3.3, bnlearn version 4.9.3 (edit)
bl = matrix(c("A", "B",
"A", "C",
"B", "C"),
ncol = 2, byrow = TRUE)
blReverse = matrix(c("B", "A",
"C", "A",
"C", "B"),
ncol = 2, byrow = TRUE)
par(mfrow = c(1, 3))
# no blacklist
graphviz.plot(gs(df), main = "No BL")
# blacklist
graphviz.plot(gs(df, blacklist = bl), main = "BL")
# reverse blacklist
graphviz.plot(gs(df, blacklist = blReverse), main = "Reverse BL")
I expect that the blacklist would restrict specific arcs as described below:
Blacklist (bl): I expect the arcs A → B, A → C, and B → C to be excluded from the learned network.
Reversed Blacklist (blReverse): I expect the reverse arcs B → A, C → A, and C → B to be excluded.
However, the results do not reflect this. For example:
With bl: I still see the arcs A → B and A → C, even though they are in the blacklist.
With blReverse: Arcs like C → A and C → B still appear, although they should be restricted
This is contrary to what I understand from the documentation:
An arc can be blacklisted in one direction (i.e. A → B is in the blacklist but B → A is not), leaving the algorithm free to include the same arc but in the opposite direction if it is supported by the data. This is useful in setting the direction of arcs which would otherwise be undirected because of score equivalence, as A – B in the network above. [https://www.bnlearn.com/examples/whitelist/]
Why is the blacklist not being properly enforced? Is there something wrong with how I am applying the blacklist? Could this be a bug in the structural learning algorithm, or am I misunderstanding how the blacklist should work?
set.seed(123)
# DATAFRAME
n <- 100 # number of rows
# variable A
A <-
sample(c("Low", "Medium", "High"),
n,
replace = TRUE,
prob = c(0.3, 0.4, 0.3))
# variable B based on A
B <- sapply(A, function(x) {
if (x == "Low")
sample(c("Small", "Medium"), 1, prob = c(0.7, 0.3))
else if (x == "Medium")
sample(c("Small", "Medium", "Large"), 1, prob = c(0.2, 0.6, 0.2))
else
sample(c("Medium", "Large"), 1, prob = c(0.4, 0.6))
})
# variable C based on A and B
C <- sapply(1:n, function(i) {
if (A[i] == "Low" &&
B[i] == "Small")
sample(c("Red", "Blue"), 1, prob = c(0.8, 0.2))
else if (A[i] == "High" &&
B[i] == "Large")
sample(c("Green", "Yellow"), 1, prob = c(0.7, 0.3))
else
sample(c("Red", "Blue", "Green", "Yellow"), 1)
})
df <- data.frame(A = A, B = B, C = C)
df <- as.data.frame(lapply(df, as.factor))
class(df)
lapply(df, class)
#BLACKLIST
bl = matrix(c("A", "B",
"A", "C",
"B", "C"),
ncol = 2, byrow = TRUE)
blReverse = matrix(c("B", "A",
"C", "A",
"C", "B"),
ncol = 2, byrow = TRUE)
#PLOT
par(mfrow = c(1, 3))
# no blacklist
graphviz.plot(gs(df), main = "No BL")
# blacklist
graphviz.plot(gs(df, blacklist = bl), main = "BL")
# reverse blacklist
graphviz.plot(gs(df, blacklist = blReverse), main = "Reverse BL")
Upvotes: 2
Views: 97
Reputation: 59
The problem was solved by the author of the library Marco Scutari.
You can find the correct version in the Development Snapshots, downloadable, as explained on the page https://www.bnlearn.com/ , through the command:
install.packages("http://www.bnlearn.com/releases/bnlearn_latest.tar.gz").
Below you can see the difference in the result with the new version bnlearn_5.1-20240924 using the same code as before
(my R version 4.4.1 (2024-06-14 ucrt))
Upvotes: 1