Reputation: 529
I am trying to set up a simple numerical MLE estimation of a multinomial distribution.
The multinomial has one constraint - all the cell probabilities need to add up to one.
Usually the way to have this constraint is to re-express one of the probabilities as (1 - sum of the others)
When I run this however, I have a problem as during the optimization procedure, I might have logarithm of a negative value.
Any thoughts of how to fix this? I tried using another optimization package (Rsolnp) and it worked, but I am trying to make it work with the simple default R optim in order to avoid constrained/nonlinear optimization.
Here is my code (I know that I can get the result in this particular case analytically, but this is a toy example, my actual problem is bigger than this here).
set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))
N <- test_data
loglik_function <- function(theta){
output <- -1*(N[1]*log(theta[1]) + N[2]*log(theta[2]) + N[3]*log(theta[3]) + N[4]*log(1- sum(theta)))
return(output)
}
startval <- rep(0.1, 3)
my_optim <- optim(startval, loglik_function, lower = 0.0001, upper = 0.9999, method = "L-BFGS-B")
Any thoughts or help would be very much appreciated. Thanks
Upvotes: 1
Views: 626
Reputation: 50738
Full heads-up: I know you asked about (constrained) ML estimation, but how about doing this the Bayesian way à la Stan/rstan
. I will remove this if it's not useful/missing the point.
The model is only a few lines of code.
library(rstan)
model_code <- "
data {
int<lower=1> K; // number of choices
int<lower=0> y[K]; // observed choices
}
parameters {
simplex[K] theta; // simplex of probabilities, one for every choice
}
model {
// Priors
theta ~ cauchy(0, 2.5); // weakly informative
// Likelihood
y ~ multinomial(theta);
}
generated quantities {
real ratio;
ratio = theta[1] / theta[2];
}
"
You can see how easy it is to implement the simplex constraint on the theta
s using the Stan data type simplex
. In the Stan language, simplex
allows you to easily implement a probability (unit) simplex
where K denotes the number of parameters (here: choices).
Also note how we use the generated quantities
code block, to calculate derived quantities (here ratio
) based on the parameters (here theta[1]
and theta[2]
). Since we have access to the posterior distributions of all parameters, calculating the distribution of derived quantities is trivial.
We then fit the model to your test_data
fit <- stan(model_code = model_code, data = list(K = 4, y = test_data[, 1]))
and show a summary of the parameter estimates
summary(fit)$summary
# mean se_mean sd 2.5% 25%
#theta[1] 0.2379866 0.0002066858 0.01352791 0.2116417 0.2288498
#theta[2] 0.26 20013 0.0002208638 0.01365478 0.2358731 0.2526111
#theta[3] 0.2452539 0.0002101333 0.01344665 0.2196868 0.2361817
#theta[4] 0.2547582 0.0002110441 0.01375618 0.2277589 0.2458899
#ratio 0.9116350 0.0012555320 0.08050852 0.7639551 0.8545142
#lp__ -1392.6941655 0.0261794859 1.19050097 -1395.8297494 -1393.2406198
# 50% 75% 97.5% n_eff Rhat
#theta[1] 0.2381541 0.2472830 0.2645305 4283.904 0.9999816
#theta[2] 0.2615782 0.2710044 0.2898404 3822.257 1.0001742
#theta[3] 0.2448304 0.2543389 0.2722152 4094.852 1.0007501
#theta[4] 0.2545946 0.2638733 0.2822803 4248.632 0.9994449
#ratio 0.9078901 0.9648312 1.0764747 4111.764 0.9998184
#lp__ -1392.3914998 -1391.8199477 -1391.3274885 2067.937 1.0013440
as well as a plot showing point estimates and CIs for the theta
parameters
plot(fit, pars = "theta")
maxLik
You can in fact implement constrained ML estimation using methods provided by the maxLik
library. I found it a bit "fiddly", because convergence seems to be quite sensitive to changes in the starting values and the optimisation method used.
For what it's worth, here is a reproducible example:
library(maxLik)
x <- test_data[, 1]
Define the log-likelihood function for a multinomial distribution; I've included an if
statement here to prevent theta < 0
cases from throwing an error.
loglik <- function(theta, x)
if (all(theta > 0)) sum(dmultinom(x, prob = theta, log = TRUE)) else 0
I use the Nelder-Mead optimisation method here to find the maximum of the log-likelihood function. The important bit here is the constraints
argument that implements a constraint in the form of the equality A theta + B = 0
, see ?maxNM
for details and examples.
res <- maxNM(
loglik,
start = rep(0.25, length(x)),
constraints = list(
eqA = matrix(rep(1, length(x)), ncol = length(x)),
eqB = -1),
x = x)
We can inspect the results
summary(res)
--------------------------------------------
Nelder-Mead maximization
Number of iterations: 111
Return code: 0
successful convergence
Function value: -10.34576
Estimates:
estimate gradient
[1,] 0.2380216 -0.014219040
[2,] 0.2620168 0.012664714
[3,] 0.2450181 0.002736670
[4,] 0.2550201 -0.002369234
Constrained optimization based on SUMT
Return code: 1
penalty close to zero
1 outer iterations, barrier value 5.868967e-09
--------------------------------------------
and confirm that indeed the sum of the estimates equals 1 (within accuracy)
sum(res$estimate)
#[1] 1.000077
set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))
Upvotes: 3