Jean_N
Jean_N

Reputation: 529

Problem with simple numerical estimation for MLE of multinomial in R

I am trying to set up a simple numerical MLE estimation of a multinomial distribution.

The multinomial has one constraint - all the cell probabilities need to add up to one.

Usually the way to have this constraint is to re-express one of the probabilities as (1 - sum of the others)

When I run this however, I have a problem as during the optimization procedure, I might have logarithm of a negative value.

Any thoughts of how to fix this? I tried using another optimization package (Rsolnp) and it worked, but I am trying to make it work with the simple default R optim in order to avoid constrained/nonlinear optimization.

Here is my code (I know that I can get the result in this particular case analytically, but this is a toy example, my actual problem is bigger than this here).

set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))
N <- test_data
loglik_function <- function(theta){
  output <- -1*(N[1]*log(theta[1]) + N[2]*log(theta[2]) + N[3]*log(theta[3]) + N[4]*log(1- sum(theta)))
  return(output)
}

startval <- rep(0.1, 3)

my_optim <- optim(startval, loglik_function, lower = 0.0001, upper = 0.9999, method = "L-BFGS-B")

Any thoughts or help would be very much appreciated. Thanks

Upvotes: 1

Views: 626

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50738

Full heads-up: I know you asked about (constrained) ML estimation, but how about doing this the Bayesian way à la Stan/rstan. I will remove this if it's not useful/missing the point.

  1. The model is only a few lines of code.

    library(rstan)
    
    model_code <- "
    data {
        int<lower=1> K;           // number of choices
        int<lower=0> y[K];        // observed choices
    }
    
    parameters {
        simplex[K] theta;         // simplex of probabilities, one for every choice
    }
    
    model {
        // Priors
        theta ~ cauchy(0, 2.5);   // weakly informative
        // Likelihood
        y ~ multinomial(theta);
    }
    
    generated quantities {
        real ratio;
        ratio = theta[1] / theta[2];
    }
    "
    

    You can see how easy it is to implement the simplex constraint on the thetas using the Stan data type simplex. In the Stan language, simplex allows you to easily implement a probability (unit) simplex

    enter image description here

    where K denotes the number of parameters (here: choices).

    Also note how we use the generated quantities code block, to calculate derived quantities (here ratio) based on the parameters (here theta[1] and theta[2]). Since we have access to the posterior distributions of all parameters, calculating the distribution of derived quantities is trivial.

  2. We then fit the model to your test_data

    fit <- stan(model_code = model_code, data = list(K = 4, y = test_data[, 1]))
    

    and show a summary of the parameter estimates

    summary(fit)$summary
    #                 mean      se_mean         sd          2.5%           25%
    #theta[1]     0.2379866 0.0002066858 0.01352791     0.2116417     0.2288498
    #theta[2]     0.26 20013 0.0002208638 0.01365478     0.2358731     0.2526111
    #theta[3]     0.2452539 0.0002101333 0.01344665     0.2196868     0.2361817
    #theta[4]     0.2547582 0.0002110441 0.01375618     0.2277589     0.2458899
    #ratio        0.9116350 0.0012555320 0.08050852     0.7639551     0.8545142
    #lp__     -1392.6941655 0.0261794859 1.19050097 -1395.8297494 -1393.2406198
    #                   50%           75%         97.5%    n_eff      Rhat
    #theta[1]     0.2381541     0.2472830     0.2645305 4283.904 0.9999816
    #theta[2]     0.2615782     0.2710044     0.2898404 3822.257 1.0001742
    #theta[3]     0.2448304     0.2543389     0.2722152 4094.852 1.0007501
    #theta[4]     0.2545946     0.2638733     0.2822803 4248.632 0.9994449
    #ratio        0.9078901     0.9648312     1.0764747 4111.764 0.9998184
    #lp__     -1392.3914998 -1391.8199477 -1391.3274885 2067.937 1.0013440
    

    as well as a plot showing point estimates and CIs for the theta parameters

    plot(fit, pars = "theta")
    

    enter image description here


Update: Constrained ML estimation using maxLik

You can in fact implement constrained ML estimation using methods provided by the maxLik library. I found it a bit "fiddly", because convergence seems to be quite sensitive to changes in the starting values and the optimisation method used.

For what it's worth, here is a reproducible example:

library(maxLik)

x <- test_data[, 1]

Define the log-likelihood function for a multinomial distribution; I've included an if statement here to prevent theta < 0 cases from throwing an error.

loglik <- function(theta, x)
    if (all(theta > 0)) sum(dmultinom(x, prob = theta, log = TRUE)) else 0

I use the Nelder-Mead optimisation method here to find the maximum of the log-likelihood function. The important bit here is the constraints argument that implements a constraint in the form of the equality A theta + B = 0, see ?maxNM for details and examples.

res <- maxNM(
    loglik,
    start = rep(0.25, length(x)),
    constraints = list(
        eqA = matrix(rep(1, length(x)), ncol = length(x)),
        eqB = -1),
    x = x)

We can inspect the results

summary(res)
--------------------------------------------
Nelder-Mead maximization
Number of iterations: 111
Return code: 0
successful convergence
Function value: -10.34576
Estimates:
      estimate     gradient
[1,] 0.2380216 -0.014219040
[2,] 0.2620168  0.012664714
[3,] 0.2450181  0.002736670
[4,] 0.2550201 -0.002369234

Constrained optimization based on SUMT
Return code: 1
penalty close to zero
1  outer iterations, barrier value 5.868967e-09
--------------------------------------------

and confirm that indeed the sum of the estimates equals 1 (within accuracy)

sum(res$estimate)
#[1] 1.000077

Sample data

set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))

Upvotes: 3

Related Questions