Reputation: 2903
I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) with class (label) unspecified.
/**
* Initialization
*/
Instances data = ...;
BayesNet bn = new EditableBayesNet(data);
SearchAlgorithm learner = new TAN();
SimpleEstimator estimator = new SimpleEstimator();
/**
* Training
*/
bn.initStructure();
learner.buildStructure(bn, data);
estimator.estimateCPTs(bn);
Suppose the Instances object data
has three attributes, A, B and C, and the dependency discovered is B->A, C->B.
The trained Bayesnet object bn
is not for classification (I did not specify the class attribute for data
), but I just want to calculate the joint probability of Pr(A=x, B=y). How do I get this probability from bn
?
As far as I know, the distributionForInstance
function of BayesNet
may be the closest thing to use. It returns the probability distribution of a given instance (in our case, the instances is (A=x, B=y)). To use it, I could create a new Instance
object testDataInstance
and set value A=x
and B=y
, and call distributionForInstance
with testDataInstance
.
/**
* Obtain Pr(A="x", B="y")
*/
Instance testDataInstance = new SparseInstance(3);
Instances testDataSet = new Instances(
bn.m_Instances);
testDataSet.clear();
testDataInstance.setValue(testDataSet.attribute("A"), "x");
testDataInstance.setValue(testDataSet.attribute("B"), "y");
testDataSet.add(testDataInstance);
bn.distributionForInstance(testDataSet.firstInstance());
However, to my knowledge, the probability distribution indicates probabilities of all possible values for the class attribute in the bayesnet. As I did not specify a class attribute for data
, it is unclear to me what the returned probability distribution means.
Upvotes: 0
Views: 257
Reputation: 1410
The javadoc page for distributionForInstance says that it calculates the class membership probabilities: http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-
So, that's not what you want probably. I think you can use the getDistribution(int nTargetNode)
or getDistribution(java.lang.String sName)
to achieve your answer.
P(A=x, B=y) can be calculated as follows,
P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,
P(A=x, B=y) = P(A=x|B=y)*P(B=y)
Here is a pseudocode which illustrates my approach,
double[][] AP = bn.getDistribution("A"); // gives P(A|B) table
double[][] BP = bn.getDistribution("B"); // gives P(B|C) table
double BPy = 0;
// I am assuming x,y to be ints, but if they are not,
// there should be some way of calculating BP[0][y] or AP[y][x]
// BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
for(int i=0;i<BP.length;i++){
BPy+=BP[0][y];
}
//BPy now contains probability of P(B=y)
System.out.println(AP[y][x]*BPy)
Upvotes: 1