Reputation: 2386
I'm trying to use Stan and R to fit a model that, uhh, models the observed realisations y_i = 16, 9, 10, 13, 19, 20, 18, 17, 35, 55, which are from a binomial distributed random variable, say, Y_i, with parameters m_i (number of trials) and p_i (success probability in each trial).
yi = c(16, 9, 10, 13, 19, 20, 18, 17, 35, 55)
For the purposes of this experiment, I'm going to assume that all of the m_i are fixed and given by m_i = 74, 99, 58, 70, 122, 77, 104, 129, 308, 119.
mi = c(74, 99, 58, 70, 122, 77, 104, 129, 308, 119)
I'm going to use Jeffrey's prior: \alpha=0.5 and \beta=0.5.
alpha = 0.5, beta = 0.5
I'm trying to
My attempt at 2. is this section of code:
real k;
real mx = 0;
real mn = 0;
if (p > mx)
mx = p;
if (mn > p) {
mn = p;
}
k = mx - mn;
My Stan code is as follows:
```{stan output.var="BinModBeta"}
data {
int <lower = 1> mi[10];
int <lower = 0> yi[10];
real <lower = 0> alpha;
real <lower = 0> beta;
}
parameters {
real <lower = 0, upper = 1> p[10];
}
transformed parameters {
real k;
real mx = 0;
real mn = 0;
if (p > mx)
mx = p;
if (mn > p) {
mn = p;
}
k = mx - mn;
}
model {
yi ~ binomial(mi, p);
p ~ beta(alpha, beta);
}
```
My R code is as follows:
```{r}
library(rstan)
```
```{r}
data.in <- list(mi = c(74, 99, 58, 70, 122, 77, 104, 129, 308, 119), yi = c(16, 9, 10, 13, 19, 20, 18, 17, 35, 55), alpha = 0.5, beta = 0.5)
model.fit1 <- sampling(BinModBeta, data=data.in)
```
```{r}
print(model.fit1, pars = c("p"), probs=c(0.1,0.5,0.9), digits = 5)
```
Now, I just started learning Stan, so I'm honestly not sure if this is correct at all. However, it seems like this code works for my first aim (at least, whatever I've coded seems to work ...). But my trouble starts when attempting to code my second aim.
When I attempt to compile the Stan code above, I get the following error:
Now, based on this error message, it seems like my issue is arising from the fact that p is a vector of 10 real values, rather than a single real. However, due to my inexperience with Stan, I'm unsure of how to get around this problem.
Upvotes: 3
Views: 2737
Reputation: 50718
Here is what I would do:
model <- "
data {
int <lower = 1> mi[10];
int <lower = 0> yi[10];
real <lower = 0> alpha;
real <lower = 0> beta;
}
parameters {
real <lower = 0, upper = 1> p[10];
}
model {
p ~ beta(alpha, beta); // Prior
yi ~ binomial(mi, p); // Likelihood
}
generated quantities {
real k;
k = max(p) - min(p);
}
"
library(rstan);
yi = c(16, 9, 10, 13, 19, 20, 18, 17, 35, 55);
mi = c(74, 99, 58, 70, 122, 77, 104, 129, 308, 119);
fit <- stan(
model_code = model,
data = list(mi = mi, yi = yi, alpha = 0.5, beta = 0.5))
fit;
#Inference for Stan model: 6a01a3b25656e1b18183baf19183abf7.
#4 chains, each with iter=2000; warmup=1000; thin=1;
#post-warmup draws per chain=1000, total post-warmup draws=4000.
#
# mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
#p[1] 0.22 0.00 0.05 0.13 0.19 0.22 0.25 0.32 4000 1
#p[2] 0.10 0.00 0.03 0.05 0.07 0.09 0.11 0.16 4000 1
#p[3] 0.18 0.00 0.05 0.09 0.14 0.17 0.21 0.28 4000 1
#p[4] 0.19 0.00 0.05 0.11 0.16 0.19 0.22 0.29 4000 1
#p[5] 0.16 0.00 0.03 0.10 0.14 0.16 0.18 0.22 4000 1
#p[6] 0.26 0.00 0.05 0.17 0.23 0.26 0.30 0.37 4000 1
#p[7] 0.18 0.00 0.04 0.11 0.15 0.17 0.20 0.25 4000 1
#p[8] 0.13 0.00 0.03 0.08 0.11 0.13 0.15 0.20 4000 1
#p[9] 0.11 0.00 0.02 0.08 0.10 0.11 0.13 0.15 4000 1
#p[10] 0.46 0.00 0.04 0.38 0.43 0.46 0.49 0.55 4000 1
#k 0.38 0.00 0.05 0.28 0.34 0.38 0.41 0.47 4000 1
#lp__ -530.01 0.05 2.26 -535.38 -531.33 -529.65 -528.37 -526.69 1782 1
#
#Samples were drawn using NUTS(diag_e) at Tue Apr 24 22:02:07 2018.
#For each parameter, n_eff is a crude measure of effective sample size,
#and Rhat is the potential scale reduction factor on split chains (at
#convergence, Rhat=1).
Comments:
I would move the part involving the calculation of k
into the generated quantities
block; this has to do with the different program blocks being executed at different times. While the transformed parameters
block is executed once every leapfrog step, the generated quantities
block is only executed once per sample draw. So there will be less overhead in recalculating k
. See e.g. here for details. Note that uncertainties from the pi
posterior densities get properly propagated to k
.
You can make use of Stan
s internal max
,min
functions when calculating k
. This will be faster than determining the min/max of pi
with if
conditions, and also eliminates the need to define mn
and mx
.
Upvotes: 2