Reputation: 1357
Suppose I have a hierarchical Bayesian model with $V$ first-level nodes, where $V$ is very large, and I am going to to do $S$ simulations. My thinking is that I could benefit by parallelizing the computation of each of those first-level nodes, and of course from running multiple chains in parallel. So I would have two for
or *apply
levels, one of the parallelization of the multiple chains, and one for the parallelization of the first-level node computations within an iteration for a particular chain. In what R packages, if any, is this possible? Thank you.
As requested, here is some high-level pseudo-code for something I'd want to do:
for node in top.cluster {
for draw in simulation {
draw population.level.variables from population.level.conditionals
for node in bottom.cluster {
draw random.effect[node] from random.effect.conditionals[node]
}
}
}
Does this make more sense?
Upvotes: 1
Views: 211
Reputation: 14093
In general, it is best to parallelize at the outermost level of the calculation as that avoids communication overhead as much as possible. Unless you tell us more specifics I don't see a point in parallelizing at two explicit levels of the code.
Here are some exceptions:
Of course that is not (easily) possibly if for your outer loop each iteration depends on the results of the last.
Another caveat is that you'd need to have sufficient memory for this high-level parallelization as (possibly) n copies of the data need to be held in RAM.
In R, you can do implicitly* parallelized matrix calculations by using a parallelized BLAS (I use OpenBLAS), which also doesn't need more memory. Depending on how much of your calculations are done by the BLAS, you may want to tweak the "outer" parallelization and the number of threads used by the BLAS.
* without any change to your code
Here's the high-performance computation task view, which gives you an overview of pacakges
Personally, I mostly use snow
+ the parallelized BLAS.
Upvotes: 2