Adam Ryczkowski
Adam Ryczkowski

Reputation: 8064

The multinomial model in stan - how to fit dirichlet distribution parameters?

I'd like to learn how to use the Dirichlet distribution in stan.

I've got a table with total number of observations of each of the six levels of the factor variable:

     counts    n     factor_var 
    -------- ------ ------------
      3710    4386    level 1   
      252     4386    level 2   
      332     4386    level 3   
       59     4386    level 4   
       29     4386    level 5   
       4      4386    level 6  

I want simply to estimate the dirichlet parameters, i.e. the probabilities that the a value comes from the given level. I know it can be trivially done in R (gtools::rdirichlet(number_of_samples, df$counts)), but I am eventually aiming at hierarchical model based on that estimates.


Here is a stan model I've got so far. The model doesn't compile, because the result of the categorical distribution is a discrete number from <1, n_levels>.

    data {
      int<lower=1> n_levels;
      int counts[n_levels];
    }

    parameters {
      vector<lower=0>[n_levels] priors;
      simplex[n_levels] level_p;
    }

    model {
      level_p ~ dirichlet(priors);
      counts ~ categorical_logit(level_p);
    }

Here is the R code:

    df <-
      data.frame(
        counts=c(3710, 252, 332, 59, 29, 4),
        n = c(4386, 4386, 4386, 4386, 4386, 4386),
        factor_var = factor(1:6, labels=paste0('level ', 1:6)))


    data<-list(
      n_levels=6,
      counts<-df$counts
    )

    fit1<-stan(
      file='model.stan')

When running it, after compilation, I've got the following error:

    SAMPLING FOR MODEL 'model' NOW (CHAIN 1).
    Rejecting initial value:
      Error evaluating the log probability at the initial value.
    Exception: categorical_logit_lpmf: categorical outcome out of support is 3710, but must be in the interval [1, 6]  (in 'model6fffc8dd594_proba_stan' at line 13)

    ...

    Initialization between (-2, 2) failed after 100 attempts. 
     Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
    [1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
    error occurred during calling the sampler; sampling not done

Upvotes: 4

Views: 4060

Answers (1)

Bob Carpenter
Bob Carpenter

Reputation: 3753

You need to use a multinomial distribution.

categorical_logit assumes log odds parameters, not a simplex.

There are examples of how to fit a Dirichlet in the manual, including some generalized priors.

Upvotes: 3

Related Questions