Reputation: 1444
I'm trying to understand/reproduce findings using stan
. However, I'm stuck somewhere. Am I using the wrong stan model?
library(nnet);library(rstan);library(dplyr);library(tidyr)
#set up data
n <- 100
set.seed(1)
dat <- data.frame(DV=factor(sample(letters[1:5], n, replace=T)),
x1 = rnorm(n, mean=4.5, sd=1.3),
x2 = sample(c(1:5), prob=c(0.035, 0.167, 0.083, 0.415, 0.298)),
x3 = sample(c(0,1), prob=c(.51, .49)),
x4 = round(rnorm(n, mean=48, sd=15),0))
library(nnet)
f <- as.formula("DV ~ x1 + x2 + x3 + x4")
out <- multinom(f, data=dat)
#summary(out)
#Store output to compare later on:
out.multinom <- tidyr::gather(as.data.frame(coef(out)), values) %>%
mutate(option=rep(row.names(coef(out)), 5)) %>%
mutate(coef= paste0(option, ":", values))%>%
dplyr::select(-option, -values)%>%
rename(multinom = value)
So far so good. Now comparing estimates to the ones from stan
:
The model is copy & pasted from here:
stan_model <- "
data {
int K;
int N;
int D;
int y[N];
matrix[N, D] x;
}
parameters {
matrix[D, K] beta;
}
model {
matrix[N, K] x_beta = x * beta;
to_vector(beta) ~ normal(0, 2);
for (n in 1:N)
y[n] ~ categorical_logit(x_beta[n]);
}
"
rstan_options(auto_write = TRUE)
options(mc.cores = 4)
M <- model.matrix(f, dat)
#data for stan
datlist <- list(N=nrow(M), #nr of obs
K=length(unique(dat[,1])), #possible outcomes
D=ncol(M), #dimension of predictor matrix
x=M, #predictor matrix
y=as.numeric(dat[,1]))
#estimate model
b.out <- stan(model_code=stan_model,
data=datlist,
iter = 1000,
chains = 4,
seed = 12591,
control = list(max_treedepth = 11))
res <- summary(b.out, par="beta", probs=.5)$summary %>% as.data.frame
#store
out.stan <- data.frame(beta=rep(colnames(M), each=5),
value.stan = res[,1]) %>%
mutate(option=rep(levels(dat$DV), length.out=25))%>%
mutate(coef= paste0(option, ":", beta))%>%
dplyr::select(-option, -beta)
#compare
merge(out.multinom, out.stan, by="coef", all.y=T)
coef multinom value.stan
1 a:(Intercept) NA -0.532803345
2 a:x1 NA 0.017020378
3 a:x2 NA 0.227622393
4 a:x3 NA -0.001617129
5 a:x4 NA 0.011291841
6 b:(Intercept) -0.308050243 -0.794106266
7 b:x1 0.314860536 0.353267471
8 b:x2 -0.305248243 -0.094612371
9 b:x3 -0.181849471 -0.203225779
10 b:x4 -0.002589588 0.007308861
11 c:(Intercept) 1.241939293 0.391090113
12 c:x1 -0.265908390 -0.230931524
13 c:x2 -0.121426457 0.113370970
14 c:x3 -0.486321891 -0.496869965
15 c:x4 0.004659122 0.019329331
16 d:(Intercept) 1.655236959 0.767339620
17 d:x1 -0.332715090 -0.291936815
18 d:x2 -0.159596712 0.072145756
19 d:x3 0.132897149 0.145568093
20 d:x4 0.002003899 0.017336138
21 e:(Intercept) 0.970658209 0.253888841
22 e:x1 -0.057914885 -0.017299638
23 e:x2 -0.501888386 -0.306445142
24 e:x3 0.515320410 0.517075558
25 e:x4 0.009115124 0.023669295
Something is not as it should be as the estimands differ. I'm almost certain that I missed something by simply copy & paste the stan
code. What is it? (I'm not talking about the fact that multinom
uses a
as the reference category).
The issue should not lie with the size of the dataset as I'm using more data (this example is prepared for stackoverflow).
Thanks for any help.
Upvotes: 2
Views: 789
Reputation: 3753
Judging from the output, it looks like the multinom
function being called uses K-1
coefficients in order to make the model identifiable. That takes the a
coefficients to be zero implicitly. If you subtract the a
coefficients from the other coefficients, b
through e
, you get roughly the same result. The results won't be exactly the same because Stan's giving you posterior means here and multinom
is probably giving you something else. There's a discussion of using K
versus K-1
coding for multinomials in the Stan User's Guide just in the same section you linked under the heading "identifiability".
Upvotes: 2