Reputation: 1309
In the Pymc3 example for multilevel linear regression (the example is here, with the radon
data set from Gelman et al.’s (2007)), the intercepts (for different counties) and slopes (for apartment with and without basement) each have a Normal prior. How can I model them together with a multivariate normal prior, so that I can examine the correlation between them?
The hierarchical model given in the example is like this:
with pm.Model() as hierarchical_model:
# Hyperpriors for group nodes
mu_a = pm.Normal('mu_a', mu=0., sd=100**2)
sigma_a = pm.HalfCauchy('sigma_a', 5)
mu_b = pm.Normal('mu_b', mu=0., sd=100**2)
sigma_b = pm.HalfCauchy('sigma_b', 5)
# Intercept for each county, distributed around group mean mu_a
# Above we just set mu and sd to a fixed value while here we
# plug in a common group distribution for all a and b (which are
# vectors of length n_counties).
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=n_counties)
# Intercept for each county, distributed around group mean mu_a
b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=n_counties)
# Model error
eps = pm.HalfCauchy('eps', 5)
radon_est = a[county_idx] + b[county_idx] * data.floor.values
# Data likelihood
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
hierarchical_trace = pm.sample(2000)
And I'm trying to make some change to the priors
with pm.Model() as correlation_model:
# Hyperpriors for group nodes
mu_a = pm.Normal('mu_a', mu=0., sd=100**2)
mu_b = pm.Normal('mu_b', mu=0., sd=100**2)
# here I want to model a and b together
# I borrowed some code from a multivariate normal model
# but the code does not work
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = T.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1)
cov = pm.Deterministic('cov', T.nlinalg.matrix_dot(sigma, C, sigma))
tau = pm.Deterministic('tau', T.nlinalg.matrix_inverse(cov))
a, b = pm.MvNormal('mu', mu=(mu_a, mu_b), tau=tau,
shape=(n_counties, n_counties))
# Model error
eps = pm.HalfCauchy('eps', 5)
radon_est = a[county_idx] + b[county_idx] * data.floor.values
# Data likelihood
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
correlation_trace = pm.sample(2000)
Here is the error message I got:
File "<ipython-input-108-ce400c54cc39>", line 14, in <module>
tau = pm.Deterministic('tau', T.nlinalg.matrix_inverse(cov))
File "/home/olivier/anaconda3/lib/python3.5/site-packages/theano/gof/op.py", line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/olivier/anaconda3/lib/python3.5/site-packages/theano/tensor/nlinalg.py", line 73, in make_node
assert x.ndim == 2
AssertionError
Clearly I've made some mistakes about the covariance matrix, but I'm new to pymc3
and completely new to theano
so have no idea how to fix it. I gather this should be a rather common use case so maybe there have been some examples on it? I just can't find them.
The full replicable code and data can be seen on the example page (link given above). I didn't include it here because it's too long and also I thought those familiar with pymc3
are very likely already quite familiar with it:)
Upvotes: 1
Views: 1344
Reputation: 2070
You forgot to add one line when creating the covariance matrix you miss-specified the shape of the MvNormal. Your model should look something like this:
with pm.Model() as correlation_model:
mu = pm.Normal('mu', mu=0., sd=10, shape=2)
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = tt.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1.)
sigma_diag = tt.nlinalg.diag(sigma) # this line
cov = tt.nlinalg.matrix_dot(sigma_diag, C, sigma_diag)
tau = tt.nlinalg.matrix_inverse(cov)
ab = pm.MvNormal('ab', mu=mu, tau=tau, shape=(n_counties, 2))
eps = pm.HalfCauchy('eps', 5)
radon_est = ab[:,0][county_idx] + ab[:,1][county_idx] * data.floor.values
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
trace = pm.sample(2000)
Notice that alternatively, you can evaluate the correlation of the intercept and the slope from the posterior of hierarchical_model
. You can use a frequentist method or build another Bayesian model, that takes as the observed data the result of hierarchical_model
. May be this could be faster.
EDIT
If you want to evaluate the correlation of two variables from the posterior you can do something like.
chain = hierarchical_trace[100:]
x_0 = chain['mu_a']
x_1 = chain['mu_b']
X = np.vstack((x_0, x_1)).T
and then you can run the following model:
with pm.Model() as correlation:
mu = pm.Normal('mu', mu=0., sd=10, shape=2)
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = tt.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1.)
sigma_diag = tt.nlinalg.diag(sigma)
cov = tt.nlinalg.matrix_dot(sigma_diag, C, sigma_diag)
tau = tt.nlinalg.matrix_inverse(cov)
yl = pm.MvNormal('yl', mu=mu, tau=tau, shape=(2, 2), observed=X)
trace = pm.sample(5000, pm.Metropolis())
You can replace x_0 and x_1 according to your needs. For example you may want to do:
x_0 = np.random.normal(chain['mu_a'], chain['sigma_a'])
x_1 = np.random.normal(chain['mu_b'], chain['sigma_b'])
Upvotes: 4