David
David

Reputation: 37

Step failure (and iteration limit) in multinomial GAM with mgcv in R

I have a data set with 40 sites (long, lat) by 2 ages by 2 genders by 6 words and 6 dependent sound categories (coded 0 to 5) as simulated here:

sound         <- sample(0:5, size=960, replace=T)
word          <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo', 'foxtrot'), each=4))
age           <- as.factor(rep(c('young', 'old'), times=480))
gender        <- as.factor(rep(c('female', 'female', 'male', 'male'), times=240))
long          <- rep(c(runif(40)), each=24)
lat           <- rep(c(runif(40)), each=24)
pronunciation <- data.frame(sound, word, age, gender, long, lat)

And I want to run a simple multinomial GAM (for now) with the mgcv package:

library(mgcv)

multinom = gam(list(sound ~ word + s(long, lat),
                          ~ word + s(long, lat),
                          ~ word + s(long, lat),
                          ~ word + s(long, lat),
                          ~ word + s(long, lat)),
                          data=pronunciation, family=multinom(K=5))

This works fine.

But when I consider the actual probability of my sound categories:

sound <- sample(0:5, size=960, prob=c(0.17, 0.31, 0.21, 0.28, 0.02, 0.03), replace=T)

Something strange happens.

Depending on the simulation run either none or one of these errors pop up:

Warning message:
In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Iteration limit reached without full convergence - check carefully
Warning message:
In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Fitting terminated with step failure - check results carefully

With my actual data only the step failure occurs.

What could cause this step failure? And is there a possibility to tweak the model or do I have to modify my data?

Upvotes: 0

Views: 237

Answers (1)

user25719676
user25719676

Reputation: 1

The actual probabilities of your pronunciation categories appear to include very rare occurrences of sound classes 4 and 5. I did not encounter your warnings when fitting a model with simulated data for a number of seeds (10, 11, 12, 100). However, I expect that you ran across this warning in cases when your random sampling led to especially small numbers (like zero) of observations of class 4 or class 5, or when observations for of those classes happened to span little to no lat/lon. You might take a look at your real data and figure out if you have sufficient sample sizes for all cases, and look at how those cases are spread across space.

When encountering fitting warnings for mgcv models, in my experience the best approach is to either simplifying the model (here, I would try reducing the knot count for your smooths) or tweak the model fitting options. To tweak fitting options, see the method and optimizer options in gam, and consider increasing the maximum iterations using gam.control (?gam.control for details). Changing the fitting options won't overcome truly insufficient data (e.g., zero class 5 observations), but the default solvers sometimes struggle when the data is sufficient-but-sparse, while alternative options might work fine. Note that I haven't carried out multinomial modeling with mgcv, so there may be additional considerations here.

In the worst case, you might consider dropping one or both of your very rare pronunciations from this specific analysis. For an extreme example, if you only had one total observation of class 5 sounds, it's probably not reasonable or interesting to generate a two-dimensional smooth across latitude and longitude to predict class 5 sounds. Dropping one or more response (and associated data) from may be disappointing, but it would allow you to fit this relatively complex model to your more frequent data (and depending on your goals, perhaps a simpler model incorporating classes 5 and 6 could fill in the gaps). Based on the probabilities listed in your sample() call, I expect that a model for just classes 0:4 will have no issues. In fact, I would suggest this as a very first step, just to confirm that the limited data in classes 5 or 6 is the cause of the warning.

Upvotes: 0

Related Questions