Kota Mori
Kota Mori

Reputation: 6750

How to define a set of vectors with various length in Stan model

I have a set of time-series variables with different length, and I am trying to estimate a hierarchical autoregressive model using the Stan language (using rstan). I have learned the basics of Stan, but am not sure how I should express a set of vectors with differing length.

I ended up augmenting the shorter vectors by filling arbitrary numbers so that all vectors have the same length, and ignored these filled values in the model section.

While this seems to be working, I would like to know if there are better ways to define this type of data. Any suggestion is appreciated!

I am on Ubuntu 14.04 with R version 3.2.3 and rstan_2.9.0.

Data

My data looks like this. Note that there are three vectors in y, each having length 3, 5, and 10. I filled the shorter vectors with 100 so that all vectors are of size 10.

$N
[1]  3  5 10

$K
[1] 3

$maxN
[1] 10

$y
$y[[1]]
 [1]  -0.626453811  -0.069571811  -0.004434404 100.000000000 100.000000000 100.000000000 100.000000000
 [8] 100.000000000 100.000000000 100.000000000

$y[[2]]
 [1]   1.5952808   0.9305912   0.4832488   0.3903673   0.3690161 100.0000000 100.0000000 100.0000000 100.0000000
[10] 100.0000000

$y[[3]]
 [1] 0.5757814 0.5876644 0.7800761 0.8410528 0.7948234 0.5938711 0.7469771 0.7677860 0.7893884 0.9048332

Stan code

My Stan code is as follows:

data {
  int<lower=0> K;     // number of series
  int<lower=0> N[K];  // length of each series

  int<lower=0> maxN;  // maximum length of series
  vector[maxN] y[K];  // time series (define as the same length)
}

parameters {
  // For simplicity, assume only beta varies across series
  real alpha;
  real beta[K];
  real<lower=0> sigma;

  real beta0;
  real<lower=0> beta_sig; 
}

model {
  for (k in 1:K) {
    beta[k] ~ normal(beta0, beta_sig);
    y[k][2:N[k]] ~ normal(alpha + beta[k]*y[k][1:(N[k]-1)], sigma);
  }
}

R script

The code below reproduces my data, Stan code and the fitting in R.

library(rstan)
dat <- structure(list(N = c(3, 5, 10), K = 3, maxN = 10, y = list(c(-0.626453810742332,
                                                                    -0.0695718108004915, -0.00443440448115216, 100, 100, 100, 100,
                                                                    100, 100, 100), c(1.59528080213779, 0.930591178250432, 0.483248750713414,
                                                                                      0.390367280599556, 0.3690161108127, 100, 100, 100, 100, 100),
                                                                  c(0.575781351653492, 0.587664377772507, 0.780076056840342,
                                                                    0.841052774797451, 0.794823439263525, 0.593871106619423,
                                                                    0.746977087771791, 0.767786018093089, 0.789388389973885,
                                                                    0.904833172045027))), .Names = c("N", "K", "maxN", "y"))
stancode <- structure("data {\n  int<lower=0> K;     // number of series\n  int<lower=0> N[K];  // length of each series\n\n  int<lower=0> maxN;  // maximum length of series\n  vector[maxN] y[K];  // time series\n}\n\nparameters {\n  // For simplicity, assume only beta varies across series\n  real alpha;\n  real beta[K];\n  real<lower=0> sigma;\n\n  real beta0;\n  real<lower=0> beta_sig;\n}\n\nmodel {\n  for (k in 1:K) {\n    beta[k] ~ normal(beta0, beta_sig);\n    y[k][2:N[k]] ~ normal(alpha + beta[k]*y[k][1:(N[k]-1)], sigma);\n  }\n}", model_name2 = "HierarchicalAR")
fit <- stan(model_code = stancode, data = dat,
            pars = c("alpha", "beta", "sigma", "beta0", "beta_sig"))

Upvotes: 2

Views: 2683

Answers (1)

Ben Goodrich
Ben Goodrich

Reputation: 4990

Currently, there are no ragged array data structures in Stan, so there is no elegant way to accomplish your task. In this situation, I think padding the shorter vectors so that they are all the same length is the easiest one, although I would use more extreme values to make indexing mistakes easier to detect.

Upvotes: 1

Related Questions