Reputation: 4850
I'm trying to fit a random effect model using RSTAN. My design matrix has 198 columns. It's so wide because my original dataframe is a bunch of factor variables, which I convert to binary indicators in order to try to fit the model in STAN. I'm able to fit models with a few columns converted from one or two predictors, but it has taken 10 hours to complete 1/2 of the samplings.
Here is the STAN code which I'm using to try to fit the model (basic linear model). I've tried to vectorize, but maybe there's a way to optimize further? Also, what is the intuition behind why it is taking so long?
data {
int<lower=0> N;
int<lower=0> J;
int<lower=0> K;
int<lower=1,upper=J> geo[N];
matrix[N,K] X;
vector[N] y;
}
parameters {
vector[J] a;
vector[K] B;
real mu_a;
real<lower=0,upper=100> sigma_a;
real<lower=0,upper=100> sigma_y;
}
model {
vector[N] y_hat;
for (i in 1:N)
y_hat[i] <- a[geo[i]];
mu_a ~ normal(0, 1);
a ~ normal(0, 1);
y ~ normal(mu_a + sigma_a * y_hat + X * B, sigma_y);
}
Upvotes: 3
Views: 2711
Reputation: 4990
The question of what can you do to speed this model up is intertwined with the question of how can it be sampled more efficiently. The intuition as to why it is taking so long may have to do with the prior dependence between a
and sigma_a
(and to a lesser extent mu_a
).
sigma_a
is small for some iteration, the elements of a
have to be close to mu_a
sigma_a
is large for some iteration, the elements of a
can be far from mu_a
. Since Stan only has one stepsize parameter to manipulate, it can be difficult to find a stepsize that is appropriate in both situations. At best, you get a stepsize that is small enough to preserve accuracy in the former case but the walltime will be adversely affected because it has to take many leapfrog steps with a small stepsize in the latter case.
For models like these, we generally recommend reparameterizing like
data {
int<lower=0> N;
int<lower=0> J; // 47 apparently but don't hardcode it
int<lower=1,upper=J> geo[N];
vector[N] y;
}
parameters {
vector[J] a;
real mu_a;
real<lower=0,upper=100> sigma_a;
real<lower=0,upper=100> sigma_y;
}
model {
vector[N] y_hat;
for (i in 1:N)
y_hat[i] <- a[geo[i]];
mu_a ~ normal(0, 1);
a ~ normal(0, 1);
y ~ normal(mu_a + sigma_a * y_hat, sigma_y);
}
Upvotes: 8