Reputation: 38462
I have a bayesian network of two prior distributions (A and B) and one posterior distribution (C|A,B). How, in scipy, would I find the most likely value of C?
Secondarily, how would I calculate the confidence level for that value? I'd define the confidence level to be equal to the probability that the actual value of C is equal or greater than the given value.
To be more concrete, A and B are cumulative probability functions (cdf) of x. Given a particular a, A(a), gives the probability that the actual value of A is less than a.
Similarly, C is a function of a,b,c and returns the probability that the actual value of C is less than c, given particular values of A and B.
Upvotes: 1
Views: 550
Reputation: 8091
You need to be working with the probability density functions (PDF) not the CDFs. You obtain the PDF from the CDF just by differentiating it (do it numerically if the functions are tabulated). Note that the way that you've specified it, taking the derivative of your
CDF(c | a, b)
with respect to c
gives you the conditional probability density p(c|a,b)
.
To obtain the marginal distribution for c
, you'll need to integrate a,b
out:
[b_min, b_max]=[-10.0, 10.0] # whatever the reasonable bound in b are
[a_min, a_max]=[-10.0,10.0] # whatever the reasonable bounds in a are
def pc_conditional( c, a,b ):
''' conditional probability density p(c|a,b)'''
...
def pa(a):
''' probability density p(a)'''
....
def pb(b):
''' probability density p(b)'''
...
def joint_distribution( c, a,b ):
''' joint distribution over all variables; p(c,a,b)=p(c|a,b)p(a)p(b) '''
return pc_conditional(c,a,b)*pa(a)*pb(b)
def pca_marginal( c, a ):
''' distribution over c,a after integrating out b; p(c,a)=integrate[ p(c,a,b)db] '''
def integrand( b ):
return joint_distribution( c,a ,b)
return scipy.integrate.quad( integrand, b_min, b_max)
def pc_marginal(c):
def integrand(a):
return pca_marginal( c, a )
return scipy.integrate.quad( integrand, a_min, a_max )
# You can all pc_marginal(c) to obtain the (marginal) probability
# density value for the selected value of c. You could use vectorize
# to allow for calling it with an array of values.
Now that you can compute the distribution p(c)
, you can compute any statistical information you like.
Upvotes: 2