Andrew
Andrew

Reputation: 15

A Simple Bayesian Network with a Coin-Flipping Problem

I am trying to implement a Bayesian network and solve a regression problem using PYMC3. In my model, I have a fair coin as the parent node. If the parent node is H, the child node selects the normal distribution N(5,0.2); if T, the child selects N(0,0.5). Here is an illustration of my network.

My network

To simulate this network, I generated a sample dataset and tried doing Bayesian regression using the code below. Currently, the model does regression only for the child node as if the parent node does not exist. I would greatly appreciate it if anyone can let me know how to implement the conditional probability P(D|C). Ultimately, I am interested in finding the probability distribution for mu1 and mu2. Thank you!

# Generate data for coin flip P(C) and store in c1
theta_real = 0.5 # unkown value in a real experiment
n_sample = 10
c1 = bernoulli.rvs(p=theta_real, size=n_sample)

# Generate data for normal distribution P(D|C) and store in d1
np.random.seed(123)
mu1 = 0
sigma1 = 0.5
mu2 = 5
sigma2 = 0.2

d1 = []
for index, item in enumerate(c1):
    if item == 0:
        d1.extend(normal(mu1, sigma1, 1))
    else:
        d1.extend(normal(mu2, sigma2, 1))

# I start building PYMC3 model here
c1_tensor = theano.shared(np.array(c1))
d1_tensor = theano.shared(np.array(d1))
with pm.Model() as model:
   # define prior for c1.  I am not sure how to do this.
   #c1_present = pm.Categorical('c1',observed=c1_tensor)

   # how do I incorporate P(D | C)
   mu_prior = pm.Normal('mu', mu=2, sd=2, shape=1)  
   sigma_prior = pm.HalfNormal('sigma', sd=2, shape=1)
   y_likelihood = pm.Normal('y', mu=mu_prior, sd=sigma_prior, observed=d1_tensor)

Upvotes: 1

Views: 731

Answers (2)

merv
merv

Reputation: 76700

This answer is to supplement @balleveryday's answer, which suggests the Gaussian Mixture Model, but had some trouble getting the symmetry breaking to work. Admittedly, the symmetry breaking in the official example is done in the context of Metropolis-Hastings sampling, whereas I think NUTS might be a little more sensitive to encountering impossible values (not sure). Here's what worked for me:

import numpy as np
import pymc3 as pm
from scipy.stats import bernoulli
import theano.tensor as tt

# everything should reproduce
np.random.seed(123)
n_sample = 2000

# Generate data for coin flip P(C) and store in c1
theta_real = 0.2 # unknown value in a real experiment

c1 = bernoulli.rvs(p=theta_real, size=n_sample)

# Generate data for normal distribution P(D|C) and store in d1
mu1, mu2 = 0, 5
sigma1, sigma2 = 0.5, 0.2

d1 = np.empty_like(c1, dtype=np.float64)
d1[c1 == 0] = np.random.normal(mu1, sigma1, np.sum(c1 == 0))
d1[c1 == 1] = np.random.normal(mu2, sigma2, np.sum(c1 == 1))

with pm.Model() as gmm_asym:
    # mixture vector
    w = pm.Dirichlet('p', a=np.ones(2))

    # Gaussian parameters (testval helps start off ordered)
    mu = pm.Normal('mu', 0, 20, shape=2, testval=[-10, 10])
    sigma = pm.HalfNormal('sigma', sd=2, shape=2)

    # break symmetry, forcing mu[0] < mu[1]
    order_means_potential = pm.Potential('order_means_potential',
                                         tt.switch(mu[1] - mu[0] < 0, -np.inf, 0))

    # observed
    pm.NormalMixture('like', w=w, mu=mu, sigma=sigma, observed=d1)

    # reproducible sampling
    tr_gmm_asym = pm.sample(tune=2000, target_accept=0.9, random_seed=20191121)

This produces samples with the statistics

          mean      sd        mc_error  hpd_2.5   hpd_97.5  n_eff       Rhat
mu__0     0.004549  0.011975  0.000226 -0.017398  0.029375  2425.487301 0.999916
mu__1     5.007663  0.008993  0.000166  4.989247  5.024692  2181.134002 0.999563
p__0      0.789983  0.009091  0.000188  0.773059  0.808062  2417.356539 0.999788
p__1      0.210017  0.009091  0.000188  0.191938  0.226941  2417.356539 0.999788
sigma__0  0.497322  0.009103  0.000186  0.480394  0.515867  2227.397854 0.999358
sigma__1  0.191310  0.006633  0.000141  0.178924  0.204859  2286.817037 0.999614

and the traces

Density and trace plots

Upvotes: 1

LeoC
LeoC

Reputation: 932

You could use the Dirichlet distribution as a prior for the coin toss and NormalMixture as the prior of the two Gaussians. In the following snippet I changed the fairness of the coin and increased the number of coin tosses, but you could adjust these in any way want:

import numpy as np
import pymc3 as pm
from scipy.stats import bernoulli

# Generate data for coin flip P(C) and store in c1
theta_real = 0.2 # unkown value in a real experiment

n_sample = 2000
c1 = bernoulli.rvs(p=theta_real, size=n_sample)

# Generate data for normal distribution P(D|C) and store in d1
np.random.seed(123)
mu1 = 0
sigma1 = 0.5
mu2 = 5
sigma2 = 0.2

d1 = []
for index, item in enumerate(c1):
    if item == 0:
        d1.extend(np.random.normal(mu1, sigma1, 1))
    else:
        d1.extend(np.random.normal(mu2, sigma2, 1))

with pm.Model() as model:
   w = pm.Dirichlet('p', a=np.ones(2))
   mu = pm.Normal('mu', 0, 20, shape=2)
   sigma = np.array([0.5,0.2]) 
   pm.NormalMixture('like',w=w,mu=mu,sigma=sigma,observed=np.array(d1))
   trace = pm.sample()

pm.summary(trace) 

This will give you the following:

           mean        sd  mc_error   hpd_2.5  hpd_97.5        n_eff      Rhat
mu__0  4.981222  0.023900  0.000491  4.935044  5.027420  2643.052184  0.999637
mu__1 -0.007660  0.004946  0.000095 -0.017388  0.001576  2481.146286  1.000312
p__0   0.213976  0.009393  0.000167  0.195602  0.231803  2245.905021  0.999302
p__1   0.786024  0.009393  0.000167  0.768197  0.804398  2245.905021  0.999302

The parameters are recovered nicely as you can also see from the traceplots:

enter image description here

The above implementation will give you the posterior of theta_real, mu1 and mu2 but I could not get convergence when I added sigma1 and sigma2 as parameters to be estimated by the data (even though the prior was quite narrow):

with pm.Model() as model:
   w = pm.Dirichlet('p', a=np.ones(2))
   mu = pm.Normal('mu', 0, 20, shape=2)
   sigma = pm.HalfNormal('sigma', sd=2, shape=2)
   pm.NormalMixture('like',w=w,mu=mu,sigma=sigma,observed=np.array(d1))
   trace = pm.sample()


print(pm.summary(trace))
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, mu, p]
Sampling 4 chains: 100%|██████████| 4000/4000 [00:10<00:00, 395.57draws/s] 
The acceptance probability does not match the target. It is 0.883057127209148, but should be close to 0.8. Try to increase the number of tuning steps.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
              mean        sd  mc_error  ...  hpd_97.5     n_eff        Rhat
mu__0     1.244021  2.165433  0.216540  ...  5.005507  2.002049  212.596596
mu__1     3.743879  2.165122  0.216510  ...  5.012067  2.002040  235.750129
p__0      0.643069  0.248630  0.024846  ...  0.803369  2.004185   30.966189
p__1      0.356931  0.248630  0.024846  ...  0.798632  2.004185   30.966189
sigma__0  0.416207  0.125435  0.012517  ...  0.504110  2.009031   17.333177
sigma__1  0.271763  0.125539  0.012533  ...  0.497208  2.007779   19.217223

[6 rows x 7 columns]

enter image description here

Based on that you most likely will need to reparametrize if you also wanted to estimate the two standard deviations from this data.

Upvotes: 1

Related Questions