Problem with simple numerical estimation for MLE of multinomial in R

Question

I am trying to set up a simple numerical MLE estimation of a multinomial distribution.

The multinomial has one constraint - all the cell probabilities need to add up to one.

Usually the way to have this constraint is to re-express one of the probabilities as (1 - sum of the others)

When I run this however, I have a problem as during the optimization procedure, I might have logarithm of a negative value.

Any thoughts of how to fix this? I tried using another optimization package (Rsolnp) and it worked, but I am trying to make it work with the simple default R optim in order to avoid constrained/nonlinear optimization.

Here is my code (I know that I can get the result in this particular case analytically, but this is a toy example, my actual problem is bigger than this here).

set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))
N <- test_data
loglik_function <- function(theta){
  output <- -1*(N[1]*log(theta[1]) + N[2]*log(theta[2]) + N[3]*log(theta[3]) + N[4]*log(1- sum(theta)))
  return(output)
}

startval <- rep(0.1, 3)

my_optim <- optim(startval, loglik_function, lower = 0.0001, upper = 0.9999, method = "L-BFGS-B")

Any thoughts or help would be very much appreciated. Thanks

Maurits Evers · Accepted Answer

Full heads-up: I know you asked about (constrained) ML estimation, but how about doing this the Bayesian way à la Stan/rstan. I will remove this if it's not useful/missing the point.

The model is only a few lines of code.
```
library(rstan)

model_code <- "
data {
    int K;           // number of choices
    int y[K];        // observed choices
}

parameters {
    simplex[K] theta;         // simplex of probabilities, one for every choice
}

model {
    // Priors
    theta ~ cauchy(0, 2.5);   // weakly informative
    // Likelihood
    y ~ multinomial(theta);
}

generated quantities {
    real ratio;
    ratio = theta[1] / theta[2];
}
"
```
You can see how easy it is to implement the simplex constraint on the thetas using the Stan data type simplex. In the Stan language, simplex allows you to easily implement a probability (unit) simplex

where K denotes the number of parameters (here: choices).

Also note how we use the generated quantities code block, to calculate derived quantities (here ratio) based on the parameters (here theta[1] and theta[2]). Since we have access to the posterior distributions of all parameters, calculating the distribution of derived quantities is trivial.

We then fit the model to your test_data

fit <- stan(model_code = model_code, data = list(K = 4, y = test_data[, 1]))

and show a summary of the parameter estimates

summary(fit)$summary
#                 mean      se_mean         sd          2.5%           25%
#theta[1]     0.2379866 0.0002066858 0.01352791     0.2116417     0.2288498
#theta[2]     0.26 20013 0.0002208638 0.01365478     0.2358731     0.2526111
#theta[3]     0.2452539 0.0002101333 0.01344665     0.2196868     0.2361817
#theta[4]     0.2547582 0.0002110441 0.01375618     0.2277589     0.2458899
#ratio        0.9116350 0.0012555320 0.08050852     0.7639551     0.8545142
#lp__     -1392.6941655 0.0261794859 1.19050097 -1395.8297494 -1393.2406198
#                   50%           75%         97.5%    n_eff      Rhat
#theta[1]     0.2381541     0.2472830     0.2645305 4283.904 0.9999816
#theta[2]     0.2615782     0.2710044     0.2898404 3822.257 1.0001742
#theta[3]     0.2448304     0.2543389     0.2722152 4094.852 1.0007501
#theta[4]     0.2545946     0.2638733     0.2822803 4248.632 0.9994449
#ratio        0.9078901     0.9648312     1.0764747 4111.764 0.9998184
#lp__     -1392.3914998 -1391.8199477 -1391.3274885 2067.937 1.0013440

as well as a plot showing point estimates and CIs for the theta parameters

plot(fit, pars = "theta")

Update: Constrained ML estimation using `maxLik`

You can in fact implement constrained ML estimation using methods provided by the maxLik library. I found it a bit "fiddly", because convergence seems to be quite sensitive to changes in the starting values and the optimisation method used.

For what it's worth, here is a reproducible example:

library(maxLik)

x <- test_data[, 1]

Define the log-likelihood function for a multinomial distribution; I've included an if statement here to prevent theta < 0 cases from throwing an error.

loglik <- function(theta, x)
    if (all(theta > 0)) sum(dmultinom(x, prob = theta, log = TRUE)) else 0

I use the Nelder-Mead optimisation method here to find the maximum of the log-likelihood function. The important bit here is the constraints argument that implements a constraint in the form of the equality A theta + B = 0, see ?maxNM for details and examples.

res <- maxNM(
    loglik,
    start = rep(0.25, length(x)),
    constraints = list(
        eqA = matrix(rep(1, length(x)), ncol = length(x)),
        eqB = -1),
    x = x)

We can inspect the results

summary(res)
--------------------------------------------
Nelder-Mead maximization
Number of iterations: 111
Return code: 0
successful convergence
Function value: -10.34576
Estimates:
      estimate     gradient
[1,] 0.2380216 -0.014219040
[2,] 0.2620168  0.012664714
[3,] 0.2450181  0.002736670
[4,] 0.2550201 -0.002369234

Constrained optimization based on SUMT
Return code: 1
penalty close to zero
1  outer iterations, barrier value 5.868967e-09
--------------------------------------------

and confirm that indeed the sum of the estimates equals 1 (within accuracy)

sum(res$estimate)
#[1] 1.000077

Sample data

set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))

Problem with simple numerical estimation for MLE of multinomial in R

Answers (1)

Update: Constrained ML estimation using `maxLik`

Sample data

Related Questions

Problem with simple numerical estimation for MLE of multinomial in R

Answers (1)

Update: Constrained ML estimation using maxLik

Sample data

Related Questions

Update: Constrained ML estimation using `maxLik`