How to speed up STAN when fitting a random effect model on a large, sparse dataframe?

Question

I'm trying to fit a random effect model using RSTAN. My design matrix has 198 columns. It's so wide because my original dataframe is a bunch of factor variables, which I convert to binary indicators in order to try to fit the model in STAN. I'm able to fit models with a few columns converted from one or two predictors, but it has taken 10 hours to complete 1/2 of the samplings.

Here is the STAN code which I'm using to try to fit the model (basic linear model). I've tried to vectorize, but maybe there's a way to optimize further? Also, what is the intuition behind why it is taking so long?

     data {
      int N;
      int J; 
      int K;
      int geo[N];
      matrix[N,K] X;
      vector[N] y;
    }
    parameters {
      vector[J] a;
      vector[K] B;
      real mu_a;
      real sigma_a;
      real sigma_y;
    } 
    model {
      vector[N] y_hat;
      for (i in 1:N)
        y_hat[i] <- a[geo[i]];
      mu_a ~ normal(0, 1);
      a ~ normal(0, 1);
      y ~ normal(mu_a + sigma_a * y_hat + X * B, sigma_y);
    }

How to speed up STAN when fitting a random effect model on a large, sparse dataframe?

Answers (1)

Related Questions