Martin Barus
Martin Barus

Reputation: 31

Error in bn.fit predict function in bnlear R

I have learned and fitted Bayesian Network in bnlearn R package and I wish to predict it's "event" node value.

fl="data/discrete_kdd_10.txt"
h=TRUE
dtbl1 = read.csv(file=fl, head=h, sep=",")
net=hc(dtbl1)
fitted=bn.fit(net,dtbl1)

I want to predict the value of "event" node based on the evidence stored in another file with the same structure as the file used for learning.

fileName="data/dcmp.txt"
dtbl2 = read.csv(file=fileName, head=h, sep=",")
predict(fitted,"event",dtbl2)

However, predict fails with Error in check.data(data) : variable duration must have at least two levels.

I don't understand why there should be any restriction on number of levels of variables in the evidence data.frame.

The dtbl2 data.frame contains only few rows, one for each scenario in which I want to predict the "event" value.

I know I can use cpquery, but I wish to use the predict function also for networks with mixed variables (both discrete and continuous). I haven't found out how to make use of evidence of continuous variable in cpqery.

Can someone please explain what I'm doing wrong with the predict function and how should I do it right? Thanks in advance!

Upvotes: 1

Views: 2610

Answers (1)

Martin Barus
Martin Barus

Reputation: 31

The problem was that reading the evidence data.frame in

fileName="data/dcmp.txt"
dtbl2 = read.csv(file=fileName, head=h, sep=",")
predict(fitted,"event",dtbl2)

caused categoric variables to be factors with different number of levels (subset of levels of the original training set).

I used following code to solve this issue.

for(i in 1:dim(dtbl2)[2]){
  dtbl2[[i]] = factor(dtbl2[[i]],levels = levels(dtbl1[[i]]))
}

By the way bnlearn package does fit models with mixed variables and also provides functions for predictions in them.

Upvotes: 2

Related Questions