Reputation: 671
Is there a way to speed up this simple PyMC model? On 20-40 data points, it takes ~5-11 seconds to fit.
import pymc
import time
import numpy as np
from collections import OrderedDict
# prior probability of rain
p_rain = 0.5
variables = OrderedDict()
# rain observations
data = [True, True, True, True, True,
False, False, False, False, False]*4
num_steps = len(data)
p_rain_given_rain = 0.9
p_rain_given_norain = 0.2
p_umbrella_given_rain = 0.8
p_umbrella_given_norain = 0.3
for n in range(num_steps):
if n == 0:
# Rain node at time t = 0
rain = pymc.Bernoulli("rain_%d" %(n), p_rain)
else:
rain_trans = \
pymc.Lambda("rain_trans",
lambda prev_rain=variables["rain_%d" %(n-1)]: \
prev_rain*p_rain_given_rain + (1-prev_rain)*p_rain_given_norain)
rain = pymc.Bernoulli("rain_%d" %(n), p=rain_trans)
umbrella_obs = \
pymc.Lambda("umbrella_obs",
lambda rain=rain: \
rain*p_umbrella_given_rain + (1-rain)*p_umbrella_given_norain)
umbrella = pymc.Bernoulli("umbrella_%d" %(n), p=umbrella_obs,
observed=True,
value=data[n])
variables["rain_%d" %(n)] = rain
variables["umbrella_%d" %(n)] = umbrella
print "running on %d points" %(len(data))
all_vars = variables.values()
t_start = time.time()
model = pymc.Model(all_vars)
m = pymc.MCMC(model)
m.sample(iter=2000)
t_end = time.time()
print "\n%.2f secs to run" %(t_end - t_start)
With only 40 data points, it takes 11 seconds to run:
running on 40 points
[-----------------100%-----------------] 2000 of 2000 complete in 11.5 sec
11.54 secs to run
(with 80 points it takes 20 seconds). This is a toy example. The expressions inside Lambda()
that determine transitions are in practice more complex. This basic code structure is flexible (whereas encoding the model with transition matrices is less flexible). Is there a way to keep a similar code structure but get better performance? Happy to switch to PyMC3 if necessary. Thanks.
Upvotes: 13
Views: 1790
Reputation: 11781
Markov-Chain Monte Carlo is a known sequential problem.
This means its runtime proportional to number of steps and run time of your fitness function.
There are some tricks you can do, however:
Harder approaches:
Finally there's a lot of research out there:
http://www.mas.ncl.ac.uk/~ndjw1/docs/pbc.pdf
https://sites.google.com/site/parallelmcmc/
http://pyinsci.blogspot.com/2010/12/efficcient-mcmc-in-python.html (pypy)
Upvotes: 6