Compound model in PyMC

Question

I'm trying to use PyMC 2.3 to obtain an estimate of the parameter of a compound model. By "compound" I mean a random variable that follows a distribution whose whose parameter is another random variable. ("nested" or "hierarchical" are somtimes used to refer to this situation, but I think they are less specific and generate more confusion in this context).

Let make a specific example. The "real" data is generated from a compound distribution that is a Poisson with a parameter that is distributed as an Exponential. In plain scipy the data is generated as follows:

import numpy as np
from scipy.stats import distributions
np.random.seed(3)  # for repeatability

nsamples = 1000
tau_true = 50
orig_lambda_sample = distributions.expon(scale=tau_true).rvs(nsamples)
data = distributions.poisson(orig_lambda_sample).rvs(nsamples)

I want to obtain an estimate of the model parameter tau_true. My approach so far in modelling this problem in PyMC is the following:

tau = pm.Uniform('tau', 0, 100)
count_rates = pm.Exponential('count_rates', beta=1/tau, size=nsamples)
counts = pm.Poisson('counts', mu=count_rates, value=data, observed=True)

Note that I use size=nsamples to have a new stochastic variable for each sample.

Finally I run the MCMC as:

model = pm.Model([count_rates, counts, tau])
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000)

The model converges (although slowly, > 10^5 iterations) to a distribution centred around 50 (tau_true). However it seems like an overkill to define 1000 stochastic variables to fit a single distribution with a single parameter.

Is there a better way to describe a compound model in PyMC?

PS I also tried with a more informative prior (tau = pm.Normal('tau', mu=51, tau=1/2**2)) but the results are similar and the convergence is still slow.

Chris Fonnesbeck · Accepted Answer

It looks like what you are trying to model is data that is over-dispersed. In fact, a negative binomial distribution is just a Poisson with a mean that is distributed according to a gamma distribution (the general form of the Exponential). So, one way to get around defining 1000 variables is to use the negative binomial directly. Keep in mind, though, despite there being nominally 1000 variables, the effective number of variables is somewhere between 1 and 1000, depending on how constrained the distribution of means is. You are essentially defining a random effect here.

Compound model in PyMC

Answers (1)

Related Questions