Difference between BUGS model and PyMC?

Question

I'm unable to replicate results from provided BUGS code using PyMC. The BUGS model is the Andersen-Gill multiplicative intensity Cox PH model.

model
{   
    # Set up data
        for(i in 1:Nsubj) {
            for(j in 1:T) {
    # risk set = 1 if obs.t >= t
                Y[i,j] <- step(obs.t[i] - t[j] + eps)
    # counting process jump = 1 if obs.t in [ t[j], t[j+1] )
    #                      i.e. if t[j] <= obs.t < t[j+1]
                dN[i, j] <- Y[i, j] * step(t[j + 1] - obs.t[i] - eps) * FAIL[i]
            }
            Useless[i] <- pscenter[i] + hhcenter[i] + ncomact[i] 
            + rleader[i] + dleader[i] + inter1[i] + inter2[i]
        }

    # Model 
        for(j in 1:T) {
            for(i in 1:Nsubj) {
                dN[i, j]   ~ dpois(Idt[i, j])              # Likelihood
                Idt[i, j] <- Y[i, j] * exp(beta[1]*pscenter[i] + beta[2]*
                hhcenter[i] + beta[3]*ncomact[i] + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) * dL0[j]    # Intensity
            }     
            dL0[j] ~ dgamma(mu[j], c)
            mu[j] <- dL0.star[j] * c    # prior mean hazard 
        }


    c ~ dgamma(0.0001, 0.00001)
    r ~ dgamma(0.001, 0.0001)


    for (j in 1 : T) {  dL0.star[j] <- r * (t[j + 1] - t[j])  } 
    # next line indicates number of covariates and is for the corresponding betas
    for(i in 1:7) {beta[i] ~ dnorm(0.0,0.00001)} 


}

I use the following initial values

list(beta=c(-.36,-.26,-.29,-.22,-.61,-9.73,-.23),  c=0.01, r=0.01, dL0=c(1, 1, 1, 1, 1,     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))

I use a single chain (for now) and 5000 iterations for burn-in. I run the estimation for 10000 additional iterations and receive the same point estimates as reported in the paper. These are also close to earlier frequentist estimates.

OpenBUGS> samplesStats('beta')
                mean    sd      MC_error        val2.5pc        median  val97.5pcstart   sample
        beta[1] 3.466   0.8906  0.03592 1.696   3.48    5.175   501     9500
        beta[2] -0.04155        0.06253 0.002487        -0.1609 -0.04355        0.08464  501     9500
        beta[3] -0.009709       0.07353 0.002008        -0.1544 -0.01052        0.1365   501     9500
        beta[4] 0.3535  0.1788  0.004184        -0.01523        0.3636  0.6724  501      9500
        beta[5] 0.08454 0.1652  0.004261        -0.2464 0.08795 0.3964  501     9500
        beta[6] -4.109  1.325   0.05224 -6.617  -4.132  -1.479  501     9500
        beta[7] 0.1413  0.08594 0.003381        -0.03404        0.1423  0.3031  501      9500
OpenBUGS> samplesStats('c')
                mean    sd      MC_error        val2.5pc        median  val97.5pcstart   sample
        c       4.053   1.08    0.02896 2.202   3.974   6.373   1001    10000
OpenBUGS> samplesStats('r')
                mean    sd      MC_error        val2.5pc        median  val97.5pcstart   sample
        r       0.01162 0.002929        7.846E-5        0.007387        0.01119 0.01848  1001    10000

I tried to replicate this in PyMC 2.3.2 with the following code. Full replication code is available here

def cox_model(dta):
    (t, obs_t, pscenter, hhcenter, ncomact, rleader,
    dleader, inter1, inter2, fail) = load_data_cox()

    T = len(t) - 1
    nsubj = len(obs_t)

    # risk set equals one if obs_t >= t
    Y = np.array([[int(obs >= time) for time in t] for obs in obs_t])
    # counting process. jump = 1 if obs_t \in [t[j], t[j+1])
    dN = np.array([[Y[i,j]*int(t[j+1] >= obs_t[i])*fail[i] for i in range(nsubj)] for j in range(T)])

    c = Gamma('c', .0001, .00001, value=.1)
    r = Gamma('r', .001, .0001, value=.1)
    dL0_star = r*np.array([t[j+1] - t[j] for j in range(T)])
    mu = dL0_star * c # prior mean hazard
    dL0 = Gamma('dL0', mu, c, value=np.ones(T))

    beta = Normal('beta', np.zeros(7), np.ones(7)*.00001, 
                  value=np.array([-.36, -.26, -.29, -.22, -.61, -9.73, -.23]))

    @deterministic
    def idt(b1=beta, dl0=dL0):
        mu_ = [[Y[i,j] * np.exp(b1[0]*pscenter[i] + b1[1]*hhcenter[i] + 
                                b1[2]*ncomact[i] + b1[3]*rleader[i] + 
                                b1[4]*dleader[i] + b1[5]*inter1[i] + 
                                b1[6]*inter2[i])*dl0[j] for i in range(nsubj)] 
                                for j in range(T)] # intensity
        return mu_

    dn_like = Poisson('dn_like', idt, value=dN, observed=True)

    return locals()

m = MCMC(cox_model())
m.sample(15000)

However, I do not come close to the same point estimates. I get something like

beta:

        Mean             SD               MC Error        95% HPD interval
        ------------------------------------------------------------------
        -0.537           1.094            0.099            [-2.549  1.492]
        0.276            0.048            0.004            [ 0.184  0.36 ]
        -1.092           0.385            0.038            [-1.559 -0.371]
        -1.461           0.746            0.073            [-2.986 -0.496]
        -1.865           0.382            0.038            [-2.471 -1.329]
        3.778            1.539            0.133            [ 1.088  6.623]
        -0.449           0.109            0.01             [-0.661 -0.26 ]


        Posterior quantiles:

        2.5             25              50              75             97.5
        |---------------|===============|===============|---------------|
        -2.892           -1.274          -0.385         0.268         1.253
        0.191            0.244           0.278          0.305         0.374
        -1.553           -1.434          -1.179         -0.793        -0.258
        -3.132           -1.856          -1.196         -0.904        -0.526
        -2.471           -2.199          -1.864         -1.632        -1.201
        1.287            2.685           3.601          4.72          7.262
        -0.714           -0.519          -0.445         -0.368        -0.273

Most worryingly, the signs are different. I thought maybe it was just a convergence issue, so I ran it overnight with 50,000 iterations without much change. Maybe there's some bug or difference in my PyMC model, particularly with the dL0 specification?

I've tried with different starting values. I've tried letting the model run for many iterations. I've centered the priors on the point estimates from BUGS.

Abraham D Flaxman · Accepted Answer

I think that the issue is non-convergence, as you thought, and the only substantial difference between the PyMC2 and BUGS implementations are the step methods and the burn-in period. To investigate thisI made a change to idt to make it run faster, but this has the same values for idt:

X = np.array([pscenter, hhcenter, ncomact, rleader, dleader, inter1, inter2]).T
@deterministic
def idt(beta=beta, dL0=dL0):
    intensity = np.exp(np.dot(X, beta))
    return np.transpose(Y[:,:T] * np.outer(intensity, dL0))

With this in place, plotting the trace of beta shows that MCMC has not converged in 50,000 iterations:

traceplot of default MCMC

There are a few things that I recommend: different starting values, different step methods, and a burn-in interval comparable to the BUGS burn-in:

vars = cox_model(dta)
pm.MAP(vars).fit(method='fmin_powell')
m = pm.MCMC(vars)
m.use_step_method(pm.AdaptiveMetropolis, m.beta)
m.sample(iter=50000, burn=25000, thin=25)

Plotting the trace in this case shows something more promising:

traceplot of different MCMC

This produces point estimates which are much more like the BUGS estimates you have above:

beta:

Mean             SD               MC Error        95% HPD interval
------------------------------------------------------------------
3.436            0.861            0.035            [ 1.827  5.192]
-0.039           0.063            0.002            [-0.155  0.081]
-0.028           0.073            0.003            [-0.159  0.119]
0.338            0.174            0.007            [ 0.009  0.679]
0.069            0.164            0.007            [-0.263  0.371]
-4.022           1.29             0.055            [-6.552 -1.497]
0.136            0.085            0.003            [-0.027  0.307]

Difference between BUGS model and PyMC?

Answers (1)

Related Questions