Mark Jin
Mark Jin

Reputation: 2903

Weka API: How to obtain a joint probability, e.g., Pr(A=x, B=y), from a BayesNet object?

I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) with class (label) unspecified.

/**
 * Initialization
 */
Instances data = ...;
BayesNet bn = new EditableBayesNet(data);
SearchAlgorithm learner = new TAN();
SimpleEstimator estimator = new SimpleEstimator();
/**
 * Training
 */
bn.initStructure();
learner.buildStructure(bn, data);
estimator.estimateCPTs(bn);

Suppose the Instances object data has three attributes, A, B and C, and the dependency discovered is B->A, C->B.

The trained Bayesnet object bn is not for classification (I did not specify the class attribute for data), but I just want to calculate the joint probability of Pr(A=x, B=y). How do I get this probability from bn?

As far as I know, the distributionForInstance function of BayesNet may be the closest thing to use. It returns the probability distribution of a given instance (in our case, the instances is (A=x, B=y)). To use it, I could create a new Instance object testDataInstance and set value A=x and B=y, and call distributionForInstance with testDataInstance.

/**
 * Obtain Pr(A="x", B="y")
 */ 
Instance testDataInstance = new SparseInstance(3);
Instances testDataSet = new Instances(
            bn.m_Instances);
testDataSet.clear();
testDataInstance.setValue(testDataSet.attribute("A"), "x");
testDataInstance.setValue(testDataSet.attribute("B"), "y");
testDataSet.add(testDataInstance);
bn.distributionForInstance(testDataSet.firstInstance());

However, to my knowledge, the probability distribution indicates probabilities of all possible values for the class attribute in the bayesnet. As I did not specify a class attribute for data, it is unclear to me what the returned probability distribution means.

Upvotes: 0

Views: 257

Answers (1)

mettleap
mettleap

Reputation: 1410

The javadoc page for distributionForInstance says that it calculates the class membership probabilities: http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-

So, that's not what you want probably. I think you can use the getDistribution(int nTargetNode) or getDistribution(java.lang.String sName) to achieve your answer.

P(A=x, B=y) can be calculated as follows,

P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,

P(A=x, B=y) = P(A=x|B=y)*P(B=y)

Here is a pseudocode which illustrates my approach,

double[][] AP = bn.getDistribution("A"); // gives P(A|B) table
double[][] BP = bn.getDistribution("B"); // gives P(B|C) table
double BPy = 0;

// I am assuming x,y to be ints, but if they are not,
// there should be some way of calculating BP[0][y] or AP[y][x]
// BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
for(int i=0;i<BP.length;i++){
    BPy+=BP[0][y];
}
//BPy now contains probability of P(B=y)
System.out.println(AP[y][x]*BPy)

Upvotes: 1

Related Questions