Reputation: 11
Goodmorning everyone, This is Riccardo from Italy. This is my first rodeo on StackOverflow's question, so sorry if I'am doing something wrong.
I am also not so expert about coding in R and I am struggling with the following issue: I am try to build a Bayesian Network in R starting from Genetic Expression Data (link for file carm.csv) using a model averaging MMHC (bootstrap on MaxMin Hill-Climbing) to learn a robust structure and a Bayes approach for parameter learning.
All is good util I use the function bn.fit {bnlearn} for parameter learning, as an error message occurs: "Error in bn.fit(...) : the graph is only partially directed.". It is true that the graph in input is not directed, but this is an University homework, and the it is not supposed to find a undirected graph (also, it is strange only one arch, btw).
I tried to search and maybe found this useful stuff Bayesian Networks in R, O'Reilly 2013, p. 35:
“bnlearn 3.2 and later versions are more picky about setting arc directions; as a result bn.gs is an undirected graph and must be extended into a DAG with cextend() to conclude the example.” ...but forcing a graph in a DAG using a function ain't so much funny to me:)
Could you please help me on this? Do you see something wrong on my code?
I attach the code here:
library(bnlearn)
library(Rgraphviz)
###BAYESIAN NETWORKS
#Use a Bayesian network to analyze genetic data (in the file carm.csv) on gene
#expression measured in a series of cytokines in order to assess
#their association with CARM protein expression on a sample of 42 subjects.
#Discretize the data using Hartemink's method (considering 3 final levels
# and 8 initial levels) and proceed with network structure learning
#using a hybrid-type algorithm (Max-Min Hill-Climbing)
#Load:
data <- as.data.frame(read.csv("carm.csv", sep = ";",))
#Discretization:
discret.data<-discretize(data, method='hartemink', breaks=3, ibreaks=8, idisc='quantile')
#Use the model averaging approach to obtain a robust final network
#(again with the mmhc algorithm), using 200 bootstrap samples
ddat<-discret.data
set.seed(123)
boot<-boot.strength(data=ddat,R=200,algorithm="mmhc")
print(boot)
plot(boot)
#Use a threshold of 70% to obtain a network using the averaged.network command,
#and then use the bayes method to do parameter learning:
#Average model:
avg.boot<-averaged.network(boot, threshold=0.70)
print(avg.boot)
plot(avg.boot)
#Parameter learning via Bayes Method:
dag.fit<-bn.fit(avg.boot, ddat, method="bayes")
Upvotes: 1
Views: 892
Reputation: 173
averaged.network
may return a partially directed graph in some situations when the probability is exactly 0.5 for both the two possible directions of the arc. One thing you can try is increasing the number of bootstrap replicates from 200 to let's say 500. That means 500 DAGs will be created from bootstrapped data in order to find the most likely arcs and directions. The more DAGs in your bootstrapping, the greater the chances you can "break the tie" and direct the arcs.
If that doesn't work, you still have an undirected arc and you need to deal with it. There are many ways of doing that, which includes experimenting with other algorithms. The solution really depends on your use case and domain. But, generally speaking, for many domains and use cases, you can simply ignore that arc and you'll still have valid DAG that can be trained to encode the joint probability of your variables. This code will do that, and will certainly make your code work, if you feel good disregarding that arc:
avg.boot<-averaged.network(boot, threshold=0.70)
arcs(avg.boot) <- directed.arcs(avg.boot) # ignore undirected arcs
Your bn.fit will certainly work now that you've removed the undirected arcs.
Upvotes: 0