Reputation: 2386
I have the following model:
I wish to learn how to implement this model.
Research and Attempt
I’ve had a look at the following Poisson GLM as an example:
```{stan output.var="PoissonGLMQR"}
data{
int<lower=1> n; // number of observations
int<lower=1> p; // number of predictors
matrix[n, p] x; // design matrix
int<lower=0> y[n]; // responses
}
transformed data{
matrix [n, p] Q_ast;
matrix [p, p] R_ast;
matrix [p, p] R_ast_inverse;
Q_ast = qr_Q(x)[,1:p] * sqrt(n - 1.0);
R_ast = qr_R(x)[1:p,] / sqrt(n - 1.0);
R_ast_inverse = inverse(R_ast);
}
parameters{
vector[p] theta; // regression coefficients for predictors
}
transformed parameters{
vector[p] beta;
vector[n] mu;
mu = exp(Q_ast*theta);
beta = R_ast_inverse * theta;
}
model{
y ~ poisson(mu);
}
```
I understand that this example has used QR reparameterization (see section 9.2 of the Stan reference manual). However, since I’m new to Stan, I’m finding this quite intimidating, and I’m unsure of how to implement the exponential GLM in the same way. Does one even need to use the QR reparameterization for the exponential case?
Anyway, my naive attempt at the exponential GLM is the following adaptation of the Poisson GLM code:
```{stan output.var="ExponentialGLM"}
data{
int<lower=1> n; // number of observations
int<lower=1> p; // number of predictors
matrix[n, p] x; // design matrix
real<lower=0> y[n]; // responses
}
transformed data{
matrix [n, p] Q_ast;
matrix [p, p] R_ast;
matrix [p, p] R_ast_inverse;
Q_ast = qr_Q(x)[,1:p] * sqrt(n - 1.0);
R_ast = qr_R(x)[1:p,] / sqrt(n - 1.0);
R_ast_inverse = inverse(R_ast);
}
parameters{
vector[p] theta; // regression coefficients for predictors
}
transformed parameters{
vector[p] beta;
vector[n] mu;
mu = exp(Q_ast*theta);
beta = (R_ast_inverse * theta);
}
model{
y ~ exponential(mu);
}
```
But, I’m totally unsure if this is even the right way to go about coding such a model in Stan. All I did was simply try to adapt the Poisson GLM example to the aforementioned exponential GLM.
I am seeking a demonstration of the exponential GLM and clarifications regarding my misunderstandings above.
Sample Data
conc time
1 6.1 0.8
2 4.2 3.5
3 0.5 12.4
4 8.8 1.1
5 1.5 8.9
6 9.2 2.4
7 8.5 0.1
8 8.7 0.4
9 6.7 3.5
10 6.5 8.3
11 6.3 2.6
12 6.7 1.5
13 0.2 16.6
14 8.7 0.1
15 7.5 1.3
Upvotes: 2
Views: 1284
Reputation: 50718
How about something like this?
We start by reading in the sample data.
df <- read.table(text =
" conc time
1 6.1 0.8
2 4.2 3.5
3 0.5 12.4
4 8.8 1.1
5 1.5 8.9
6 9.2 2.4
7 8.5 0.1
8 8.7 0.4
9 6.7 3.5
10 6.5 8.3
11 6.3 2.6
12 6.7 1.5
13 0.2 16.6
14 8.7 0.1
15 7.5 1.3", header = T);
We define the Stan model, as a simple Gamma (Exponential) model with a log-link.
model <- "
data {
int N; // Number of observations
int K; // Number of parameters
real y[N]; // Response
matrix[N,K] X; // Model matrix
}
parameters {
vector[K] beta; // Model parameters
}
transformed parameters {
vector[N] lambda;
lambda = exp(-X * beta); // log-link
}
model {
// Priors
beta[1] ~ cauchy(0, 10);
for (i in 2:K)
beta[i] ~ cauchy(0, 2.5);
y ~ exponential(lambda);
}
"
Here I assume (standard) weakly informative Cauchy priors on the beta model parameters.
We now fit the model to the data.
library(rstan);
options(mc.cores = parallel::detectCores());
rstan_options(auto_write = TRUE);
X <- model.matrix(time ~ conc, data = df);
model.stan <- stan(
model_code = model,
data = list(
N = nrow(df),
K = ncol(X),
y = df$time,
X = X),
iter = 4000);
To compare point estimates, we also fit a Gamma GLM to the data using glm
.
model.glm <- glm(time ~ conc, data = df, family = Gamma(link = "log"));
The Stan model parameter estimates are
summary(model.stan, "beta")$summary;
# mean se_mean sd 2.5% 25% 50%
#beta[1] 2.9371457 0.016460000 0.62017934 1.8671652 2.5000356 2.8871936
#beta[2] -0.3099799 0.002420744 0.09252423 -0.5045937 -0.3708111 -0.3046333
# 75% 97.5% n_eff Rhat
#beta[1] 3.3193334 4.3078478 1419.629 1.002440
#beta[2] -0.2461567 -0.1412095 1460.876 1.002236
The GLM model parameter estimates are
summary(model.glm)$coef;
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 2.8211487 0.54432115 5.182875 0.0001762685
#conc -0.3013355 0.08137986 -3.702827 0.0026556952
We see good agreement between the Stan point estimates for the means and Gamma-GLM parameter estimates.
We plot parameter estimates from both the Stan and glm
models.
library(tidyverse);
as.data.frame(summary(model.stan, "beta")$summary) %>%
rownames_to_column("Parameter") %>%
ggplot(aes(x = `50%`, y = Parameter)) +
geom_point(size = 3) +
geom_segment(aes(x = `2.5%`, xend = `97.5%`, yend = Parameter), size = 1) +
geom_segment(aes(x = `25%`, xend = `75%`, yend = Parameter), size = 2) +
labs(x = "Median (plus/minus 95% and 50% CIs)") +
geom_point(
data = data.frame(summary(model.glm)$coef, row.names = c("beta[1]", "beta[2]")) %>%
rownames_to_column("Parameter"),
aes(x = Estimate, y = Parameter),
colour = "red", size = 4, shape = 1)
glm
estimates are shown in red.
Stan model with QR re-parametrisation.
model.QR <- "
data {
int N; // Number of observations
int K; // Number of parameters
real y[N]; // Response
matrix[N,K] X; // Model matrix
}
transformed data {
matrix[N, K] Q;
matrix[K, K] R;
matrix[K, K] R_inverse;
Q = qr_Q(X)[, 1:K];
R = qr_R(X)[1:K, ];
R_inverse = inverse(R);
}
parameters {
vector[K] theta;
}
transformed parameters {
vector[N] lambda;
lambda = exp(-Q * theta); // log-link
}
model {
y ~ exponential(lambda);
}
generated quantities {
vector[K] beta; // Model parameters
beta = R_inverse * theta;
}
"
In the QR decomposition, we have lambda = exp(- X * beta) = exp(- Q * R * beta) = exp(- Q * theta)
, where theta = R * beta
and therefore beta = R^-1 * theta
.
Note that we do not specify any priors on theta; in Stan this defaults to flat (i.e. uniform) priors.
Fit Stan model to the data.
library(rstan);
options(mc.cores = parallel::detectCores());
rstan_options(auto_write = TRUE);
X <- model.matrix(time ~ conc, data = df);
model.stan.QR <- stan(
model_code = model.QR,
data = list(
N = nrow(df),
K = ncol(X),
y = df$time,
X = X),
iter = 4000);
Parameter estimates
summary(model.stan.QR, "beta")$summary;
# mean se_mean sd 2.5% 25% 50%
#beta[1] 2.9637547 0.009129250 0.64383609 1.8396681 2.5174800 2.9194682
#beta[2] -0.3134252 0.001321584 0.09477156 -0.5126905 -0.3740475 -0.3093937
# 75% 97.5% n_eff Rhat
#beta[1] 3.3498585 4.3593912 4973.710 0.9998545
#beta[2] -0.2478029 -0.1395686 5142.408 1.0003236
You can see that parameter estimates between the two Stan models (with and without QR re-parametrisation) agree very well.
If you were to ask me whether QR re-parametrisation makes a (big/any) difference, I'd say "probably not in this case"; Andrew Gelman and others have often emphasised that using even very weakly informative priors will help with convergence and should be preferred over flat (uniform) priors; I would always try to use weakly informative priors on all parameters, and start with a model without QR re-parametrisation; if convergence is poor, I would then try to optimise my model in a next step.
Upvotes: 3