goldisfine
goldisfine

Reputation: 4850

How to speed up STAN when fitting a random effect model on a large, sparse dataframe?

I'm trying to fit a random effect model using RSTAN. My design matrix has 198 columns. It's so wide because my original dataframe is a bunch of factor variables, which I convert to binary indicators in order to try to fit the model in STAN. I'm able to fit models with a few columns converted from one or two predictors, but it has taken 10 hours to complete 1/2 of the samplings.

Here is the STAN code which I'm using to try to fit the model (basic linear model). I've tried to vectorize, but maybe there's a way to optimize further? Also, what is the intuition behind why it is taking so long?

     data {
      int<lower=0> N;
      int<lower=0> J; 
      int<lower=0> K;
      int<lower=1,upper=J> geo[N];
      matrix[N,K] X;
      vector[N] y;
    }
    parameters {
      vector[J] a;
      vector[K] B;
      real mu_a;
      real<lower=0,upper=100> sigma_a;
      real<lower=0,upper=100> sigma_y;
    } 
    model {
      vector[N] y_hat;
      for (i in 1:N)
        y_hat[i] <- a[geo[i]];
      mu_a ~ normal(0, 1);
      a ~ normal(0, 1);
      y ~ normal(mu_a + sigma_a * y_hat + X * B, sigma_y);
    }

Upvotes: 3

Views: 2711

Answers (1)

Ben Goodrich
Ben Goodrich

Reputation: 4990

The question of what can you do to speed this model up is intertwined with the question of how can it be sampled more efficiently. The intuition as to why it is taking so long may have to do with the prior dependence between a and sigma_a (and to a lesser extent mu_a).

  1. When sigma_a is small for some iteration, the elements of a have to be close to mu_a
  2. When sigma_a is large for some iteration, the elements of a can be far from mu_a.

Since Stan only has one stepsize parameter to manipulate, it can be difficult to find a stepsize that is appropriate in both situations. At best, you get a stepsize that is small enough to preserve accuracy in the former case but the walltime will be adversely affected because it has to take many leapfrog steps with a small stepsize in the latter case.

For models like these, we generally recommend reparameterizing like

data {
  int<lower=0> N;
  int<lower=0> J; // 47 apparently but don't hardcode it
  int<lower=1,upper=J> geo[N];
  vector[N] y;
}  
parameters {
  vector[J] a;
  real mu_a;
  real<lower=0,upper=100> sigma_a;
  real<lower=0,upper=100> sigma_y;
} 
model {
  vector[N] y_hat;
  for (i in 1:N)
    y_hat[i] <- a[geo[i]];
  mu_a ~ normal(0, 1);
  a ~ normal(0, 1);
  y ~ normal(mu_a + sigma_a * y_hat, sigma_y);
}

Upvotes: 8

Related Questions