Compare multinom to stan multi logit regression

Question

I'm trying to understand/reproduce findings using stan. However, I'm stuck somewhere. Am I using the wrong stan model?

library(nnet);library(rstan);library(dplyr);library(tidyr)

#set up data 
n <- 100
set.seed(1)
dat <- data.frame(DV=factor(sample(letters[1:5], n, replace=T)),
       x1 = rnorm(n, mean=4.5, sd=1.3),
       x2 = sample(c(1:5), prob=c(0.035, 0.167, 0.083, 0.415, 0.298)),
       x3 = sample(c(0,1), prob=c(.51, .49)),
       x4 = round(rnorm(n, mean=48, sd=15),0))

library(nnet)
f <- as.formula("DV ~ x1 + x2 + x3 + x4")
out <- multinom(f, data=dat)
#summary(out)

#Store output to compare later on:
out.multinom <- tidyr::gather(as.data.frame(coef(out)), values) %>%  
  mutate(option=rep(row.names(coef(out)), 5)) %>%
  mutate(coef= paste0(option, ":", values))%>%
  dplyr::select(-option, -values)%>%
  rename(multinom = value)

So far so good. Now comparing estimates to the ones from stan:

The model is copy & pasted from here:

stan_model <- "
 data {
 int K;
 int N;
 int D;
 int y[N];
 matrix[N, D] x;
 }
 parameters {
 matrix[D, K] beta;
 }
 model {
 matrix[N, K] x_beta = x * beta;

 to_vector(beta) ~ normal(0, 2);

 for (n in 1:N)
 y[n] ~ categorical_logit(x_beta[n]);
 }

"

rstan_options(auto_write = TRUE)
options(mc.cores = 4)

M <- model.matrix(f, dat)

#data for stan
datlist <- list(N=nrow(M),                     #nr of obs
                K=length(unique(dat[,1])),     #possible outcomes
                D=ncol(M),                     #dimension of predictor matrix
                x=M,                           #predictor matrix
                y=as.numeric(dat[,1]))

 #estimate model
 b.out <- stan(model_code=stan_model, 
          data=datlist,
          iter = 1000,
          chains = 4,
          seed = 12591,
          control = list(max_treedepth = 11))


 res <- summary(b.out, par="beta", probs=.5)$summary %>% as.data.frame

 #store
 out.stan <- data.frame(beta=rep(colnames(M), each=5), 
                   value.stan = res[,1]) %>%
  mutate(option=rep(levels(dat$DV), length.out=25))%>%
  mutate(coef= paste0(option, ":", beta))%>%
  dplyr::select(-option, -beta)

#compare
merge(out.multinom, out.stan, by="coef", all.y=T)


            coef     multinom   value.stan
1  a:(Intercept)           NA -0.532803345
2           a:x1           NA  0.017020378
3           a:x2           NA  0.227622393
4           a:x3           NA -0.001617129
5           a:x4           NA  0.011291841
6  b:(Intercept) -0.308050243 -0.794106266
7           b:x1  0.314860536  0.353267471
8           b:x2 -0.305248243 -0.094612371
9           b:x3 -0.181849471 -0.203225779
10          b:x4 -0.002589588  0.007308861
11 c:(Intercept)  1.241939293  0.391090113
12          c:x1 -0.265908390 -0.230931524
13          c:x2 -0.121426457  0.113370970
14          c:x3 -0.486321891 -0.496869965
15          c:x4  0.004659122  0.019329331
16 d:(Intercept)  1.655236959  0.767339620
17          d:x1 -0.332715090 -0.291936815
18          d:x2 -0.159596712  0.072145756
19          d:x3  0.132897149  0.145568093
20          d:x4  0.002003899  0.017336138
21 e:(Intercept)  0.970658209  0.253888841
22          e:x1 -0.057914885 -0.017299638
23          e:x2 -0.501888386 -0.306445142
24          e:x3  0.515320410  0.517075558
25          e:x4  0.009115124  0.023669295

Something is not as it should be as the estimands differ. I'm almost certain that I missed something by simply copy & paste the stan code. What is it? (I'm not talking about the fact that multinom uses a as the reference category).

The issue should not lie with the size of the dataset as I'm using more data (this example is prepared for stackoverflow).

Thanks for any help.

Bob Carpenter · Accepted Answer

Judging from the output, it looks like the multinom function being called uses K-1 coefficients in order to make the model identifiable. That takes the a coefficients to be zero implicitly. If you subtract the a coefficients from the other coefficients, b through e, you get roughly the same result. The results won't be exactly the same because Stan's giving you posterior means here and multinom is probably giving you something else. There's a discussion of using K versus K-1 coding for multinomials in the Stan User's Guide just in the same section you linked under the heading "identifiability".

Compare multinom to stan multi logit regression

Answers (1)

Related Questions