icedcoffee
icedcoffee

Reputation: 1015

Calculating the log-likelihood of a set of observations sampled from a mixture of two normal distributions using R

I wrote a function to calculate the log-likelihood of a set of observations sampled from a mixture of two normal distributions. This function is not giving me the correct answer.

I will not know which of the two distributions any given sample is from, so the function needs to sum over possibilities.

This function takes a vector of five model parameters as its first argument (μ1, σ1​, μ2​, σ2​ and p) where μi​ and σi​ are the mean and standard deviation of the ith distribution and p is the probability a sample is from the first distribution. For the second argument, the function takes a vector of observations.

I have written the following function:

mixloglik <- function(p, v) {
    sum(log(dnorm(v, p[1], p[2])*p[5] + dnorm(v,p[3],p[4]))*p[5])
}

I can create test data, for which I know the solution should be ~ -854.6359:

set.seed(42)
v<- c(rnorm(100), rnorm(200, 8, 2))
p <- c(0, 1, 6, 2, 0.5)

When I test this function on the test data I do not get the correct solution

> mixloglik(p, v)
[1] -356.7194

I know the solution should be ~ -854.6359. Where am I going wrong in my function?

Upvotes: 1

Views: 839

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

The correct expression for the log-likelihood is the following.

mixloglik <- function(p, v) {
  sum(log(p[5]*dnorm(v, p[1], p[2]) + (1 - p[5])*dnorm(v, p[3], p[4])))
}

Now try it:

set.seed(42)
v<- c(rnorm(100), rnorm(200, 8, 2))
p <- c(0, 1, 6, 2, 0.5)
mixloglik(p, v)
#[1] -854.6359

In cases like this, the best way to solve the error is to restart by rewriting the expression on paper and recode it.

Upvotes: 3

Related Questions