luke.sonnet
luke.sonnet

Reputation: 475

How do I find the maximum likelihood of a specific multivariate normal log likelihood in R?

I'm having trouble optimizing a multivariate normal log-likelihood in R. If anyone has a good solution for that, please let me know. Specifically, I cannot seem to keep the variance-covariance matrix positive-definite and the parameters in a reasonable range.

Let me introduce the problem more completely. I am essentially trying to simultaneously solve these two regression equations using MLE:

$$ y_1 = \beta_1 + \beta_2 x_1 + \beta_3 x_2 \\ y_2 = \beta_4 + \beta_3 x_1 + \beta_5 x_2 $$

The fact that $\beta_3$ is in both equations is not a mistake. I try to solve this using MLE by maximizing the likelihood of the multivariate normal distribution for $Y = (y_1, y_2)^\top$ where the mean is parameterized as above in the regression equations.

I've attached the log-likelihood function as I believe it should be, where I constrain the variance covariance matrix to be positive-definite by recreating it from necessarily positive eigenvalues and a cholesky decomposition.

mvrestricted_ll <- function(par, Y, X) {

  # Indices
  n <- nrow(X)
  nbetas <- (2 + 3 * (ncol(Y) - 1))

  # Extract parameters
  beta <- par[1:nbetas]
  eigvals <- exp(par[(nbetas + 1):(nbetas + ncol(Y))]) # constrain to be positive
  chole <- par[(nbetas + ncol(Y) + 1):(nbetas + ncol(Y) + ncol(Y)*(ncol(Y)+1)/2)]

  # Build Sigma from positive eigenvalues and cholesky (should be pos def)
  L <- diag(ncol(Y))
  L[lower.tri(L, diag=T)] <- chole
  Sigma <- diag(eigvals) + tcrossprod(L)

  # Linear predictor
  # Hard coded for 2x2 example for now
  mu <- cbind(beta[1] + beta[2]*X[,1] + beta[3]*X[,2],
              beta[4] + beta[3]*X[,1] + beta[5]*X[,2])

  yminmu <- Y - mu

  nlogs <- n * log(det(Sigma))

  invSigma <- solve(Sigma)

  meat <- yminmu %*% tcrossprod(invSigma, yminmu)

  return(- nlogs - sum(diag(meat)))
}

# Create fake data
n <- 1000
p <- 2
set.seed(20160201)
X <- matrix(rnorm(n*p), nrow = n)
set.seed(20160201)
Y <- matrix(rnorm(n*p), nrow = n)

# Initialize parameters
initpars <- c(rep(0, (2 + 3 * (ncol(Y) - 1)) + ncol(Y) + ncol(Y)*(ncol(Y)+1)/2))
# Optimize fails with BFGS
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y, method = "BFGS")
# Optim does not converge with Nelder-mead, if you up the maxits it also fails
optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y)

Any help would be greatly appreciated.

EDIT: I should note that just letting Sigma be a vector in the parameters and then returning a very large value whenever it is not positive definite does not work either.

Upvotes: 5

Views: 1324

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226182

I have no idea if the code/answer is correct, but

invSigma <- try(solve(Sigma))
if (inherits(invSigma, "try-error")) return(NA)

and running

optim(par = initpars, fn = mvrestricted_ll, X=X, Y=Y,
      control = list(maxit = 1e5))

gets me a little farther to a convergence code of 10 (degenerate Nelder-Mead simplex).

$par
 [1]  1.361612e+01  4.674349e+01 -3.050170e+01  3.305013e+01  6.731194e+01
 [6] -3.117192e+01 -5.408598e+00 -6.326897e-07 -1.987449e+01 -1.795924e+01

$value
[1] -1.529013e+19

$counts
function gradient 
    1219       NA 

$convergence
[1] 10

I suspect that a real solution will involve looking more carefully at the code to see if it's really doing what you think it's doing (sorry); understanding why solve() errors occur might be a good first step. You can work on troubleshooting this by putting a cat(par, "\n") as the first line of the function and running it without the try/NA-return code. That will allow you to isolate an example data set that throws the error — then you can work your way through your code a line at a time (with debug() or by hand) to see what's happening.

Upvotes: 1

Emmanuel Hamel
Emmanuel Hamel

Reputation: 2213

You can consider using the following approach :

library(DEoptim)

fn <- function(par, mat_X, mat_Y)
{
  X <- mat_X
  Y <- mat_Y
  n <- nrow(X)
  nbetas <- (2 + 3 * (ncol(Y) - 1))
  beta <- par[1 : nbetas]
  eigvals <- exp(par[(nbetas + 1) : (nbetas + ncol(Y))]) 
  chole <- par[(nbetas + ncol(Y) + 1) : (nbetas + ncol(Y) + ncol(Y) * (ncol(Y) + 1) / 2)]
  L <- diag(ncol(Y))
  L[lower.tri(L, diag = TRUE)] <- chole
  
  Sigma <- tryCatch(diag(eigvals) + tcrossprod(L), error = function(e) NA)
  
  if(is.null(dim(Sigma)))
  {
    return(10 ^ 30)
    
  }else
  {
    mu <- cbind(beta[1] + beta[2] * X[,1] + beta[3] * X[,2],
                beta[4] + beta[3] * X[,1] + beta[5] * X[,2])
    
    yminmu <- Y - mu
    nlogs <- n * log(det(Sigma))
    invSigma <- tryCatch(solve(Sigma), error = function(e) NA)
    
    if(is.null(dim(invSigma)))
    {
      return(10 ^ 30)
      
    }else
    {
      meat <- yminmu %*% tcrossprod(invSigma, yminmu)
      log_Lik <- - nlogs - sum(diag(meat))
      
      if(is.na(log_Lik) | is.nan(log_Lik) | is.infinite(log_Lik))
      {
        return(10 ^ 30)
        
      }else
      {
        return(-log_Lik)
      }
    }
  }
}

n <- 1000
p <- 2

set.seed(20160201)
mat_X <- matrix(rnorm(n * p), nrow = n)

set.seed(2436537)
mat_Y <- matrix(rnorm(n * p), nrow = n)

lower <- rep(-10, 10)
upper <- rep(10, 10)
DEoptim(fn = fn, lower = lower, upper = upper, 
        control = list(itermax = 10000, parallelType = 1), mat_X = mat_X, mat_Y = mat_Y)

Upvotes: 0

Related Questions