swetharevanur
swetharevanur

Reputation: 99

R: mix() in mixdist package returning error

I have installed the mixdist package in R to combine distributions. Specifically, I'm using the mix() function. See documentation. Basically, I'm getting

Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist, : missing value in parameter

I googled the error message, but no useful results popped up.

My first argument to mix() is a data frame called data.df. It is formatted exactly like the built-in data set pike65. I also did data.df <- as.mixdata(data.df).

My second argument has two rows. It is a data frame called datapar, formatted exactly like pikepar. My pi values are 0.5 and 0.5. My mu values are 250 and 463 (based on my data set). My sigma values are 0.5 and 1.

My call to mix() looks like:
fitdata <- mix(data.df, datapar, "norm", constr = mixconstr(consigma="CCV"), emsteps = 3, print.level = 2)

The printing shows that my pi values go from 0.5 to NaN after the first iteration, and that my gradient is becoming 0.

I would appreciate any help in sorting out this error.

Thanks,
n.i.

Upvotes: 4

Views: 1691

Answers (3)

Cedric
Cedric

Reputation: 2474

In addition, you can get this message if you have missing data in your dataset.

From example set

data(pike65)
data(pikepar)
pike65$freq[10] <- NA
fitpike1 <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3)

Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist, : missing value in parameter

Upvotes: 2

Mikko
Mikko

Reputation: 7755

Now, I am not an expert in mixture distributions, but I think @MrFlick's accepted answer is a little bit misleading for anyone googling the error message (although no doubt correct for the example he gave). The core problem is that in both, your linked code and your example, the sigma values are very small compared to mu values. I think that the algorithm just cannot manage to find a solution with such small starting sigma values. If you increase the sigma values, you will get a solution. Linked code as an example:

library(mixdist) 
time <- seq(673,723) 
counts <- c(3, 12, 8, 12, 18, 24, 39, 48, 64, 88, 101, 132, 198, 253, 331, 419, 563, 781, 1134, 1423, 1842, 2505, 374, 6099, 9343, 13009, 15097, 13712, 9969, 6785, 4742, 3626, 3794, 4737, 5494, 5656, 4806, 3474, 2165, 1290, 799, 431, 213, 137, 66, 57, 41, 35, 27, 27, 27) 
data.df <- data.frame(time=time, counts=counts) 
data.mix <- as.mixdata(data.df) 
startparam <- mixparam(mu = c(699,707), sigma = 1) 
data.fit <- mix(data.mix, startparam, "norm") ## Leads to the error message 

startparam <- mixparam(mu = c(699,707), sigma = 5) # Adjust start parameters
data.fit <- mix(data.mix, startparam, "norm")
plot(data.fit)
data.fit ### Estimates somewhat reasonable mixture distributions
# Parameters:
#     pi    mu sigma
# 1 0.853 699.3 4.494
# 2 0.147 708.6 2.217

enter image description here

A bottom line: if you can increase your start parameter sigma values, mix function might find reasonable estimates for you. You do not necessarily have to try another package.

Upvotes: 5

MrFlick
MrFlick

Reputation: 206576

Using the test data you linked to

library(mixdist) 
time <- seq(673,723) 
counts <-c(3,12,8,12,18,24,39,48,64,88,101,132,198,253,331,
   419,563,781,1134,1423,1842,2505,374,6099,9343,13009, 
   15097,13712,9969,6785,4742,3626,3794,4737,5494,5656,4806,
   3474,2165,1290,799,431,213,137,66,57,41,35,27,27,27) 
data.df <- data.frame(time=time, counts=counts) 

We can see that

startparam <- mixparam(c(699,707),1 )
data.fit <- mix(data.mix, startparam, "norm") 

Gives the same error. This error appears to be closely tied to the data (so the reason this data does not work could be potentially different than why yours does not work but this is the only example you offered up).

The problem with this data is that the probability between the two groups becomes indistinguishable at some point. Then that happens, the "E" step of the algorithm cannot estimate the pi variable properly. Here

pnorm(717,707,1)
# [1] 1
pnorm(717,699,1)
# [1] 1

both are exactly 1 and this seems to be causing the error. When mix takes 1 minus this value and compares the ratio to estimate group, it gets NaN values which are propagated to the estimate of proportions. When internally these NaN values are passed to nlm() to do the estimation, you get the error message

Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist,  : 
  missing value in parameter

The same error message can be replicated with

f <- function(x) sum((x-1:length(x))^2)
nlm(f, c(10,10))
nlm(f, c(10,NaN)) #error

So it appears the maxdist package will not work in this scenario. You may wish to contact the package maintainer to see if they are aware of the problem. In the meantime you will will need to find another way to estimate the parameters of you mixture model.

Upvotes: 5

Related Questions