Reputation: 907
I am running a generalized linear model with random effects using the glmer()
function from the package lme4
.
The model code looks like this:
mod6 <- glmer((Ndifference+74337) ~ netm1011 + d1011 +
b0001 + (1|region), Family = Gamma(link = "identity"))
Ndifference
is count data of population differences between 200 and 2010 of 50 states (and DC). There is one negative value (Michigan at -74336) so I added a constant to make sure my response was all positive.
All the predictors (aside from the random effect of region) are proportions or percentages. Netm1011 (rate of immigration to the states in 2010) and d1011 (rate of deaths per 1000 people) both have several negative values. B0001 contains all positive proportions (birth rate/1000 people).
When I run the model I keep getting this error:
Error in eval(substitute(expr), envir, enclos) :
(maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
Before this error theres a crazy amount of output that looks like this::
I've tried different families of distributions as well (Gamma
, inverse.gaussian
). What exactly does this error code mean?
Below is the data I've been using (just the variables involved). Any help would be appreciated!
structure(list(Ndifference = c(333269L, 86445L, 1245536L, 244720L,
3333964L, 725062L, 166537L, 113590L, 33923L, 2791639L, 1484925L,
151993L, 271526L, 404489L, 399906L, 122812L, 167087L, 299231L,
74155L, 50624L, 477090L, 206187L, -74336L, 379644L, 120037L,
392539L, 87578L, 117119L, 685035L, 76962L, 374381L, 243485L,
409910L, 1481681L, 33444L, 178335L, 306232L, 408919L, 428717L,
2533L, 612397L, 60714L, 654646L, 4296180L, 533538L, 16207L, 921089L,
835264L, 46920L, 317348L, 70366L), d1011 = c(0.01009935290407,
0.00482181820219, 0.00740624039708, 0.00988384227183, 0.00640323497813,
0.00628209882119, 0.00812947210436, 0.00872837354861, 0.00764311417232,
0.00915166624244, 0.00737517844004, 0.00718037578961, 0.00746442540795,
0.00794410854005, 0.00889218497298, 0.00923607712106, 0.00850800517833,
0.00983998039872, 0.00904860671746, 0.00978543746728, 0.00752488166029,
0.00814412474047, 0.008998680863, 0.0074466124005, 0.00971662809766,
0.00917030625948, 0.00880861178822, 0.00819753663997, 0.00718370505053,
0.00796602176569, 0.00789025770533, 0.00777472712417, 0.00769648628539,
0.00831202019281, 0.00850432185633, 0.00953304172455, 0.00962020831593,
0.0084093843696, 0.00992588646267, 0.00893168396866, 0.00908595754594,
0.00854178331167, 0.00947807131183, 0.00662702930588, 0.0053663066427,
0.00848516414343, 0.00741560390799, 0.00724357008593, 0.01174960990152,
0.00835051236548, 0.00772546941972), netm1011 = c(0.00109618827436,
0.00189063449694, 0.00284535027555, 0.00218869215636, 0.00200262974559,
0.0065388101588, 0.00074903204593, 0.00531214993154, 0.01546474398708,
0.01046605886554, 0.00346226170766, 0.0039720759906, 0.00110199747387,
-0.00340610876916, -4.63800737643485e-05, 0.00143230827182, -0.0018378102704,
0.00157293366968, 0.00169295518939, 0.00086246831653, 0.00396682929054,
0.00395032406919, -0.00265224491201, 0.00162162050201, -0.0011606066005,
-0.00128783881235, 0.00364476878277, 0.00043559148624, -0.00024040613102,
-0.00066598675772, -8.70119549428016e-05, 0.00073131738351, 0.0004310477698,
0.00519235806746, 0.00995606223948, -0.00192862200551, 0.00257535479622,
0.00452502363079, 0.00132008444764, -0.0033720597776, 0.00464986350895,
0.00318540398886, 0.0036471909126, 0.00699275905022, 0.00104501002309,
7.98829235871906e-05, 0.00428168852619, 0.00637386122264, 0.00108682812851,
-0.00029879124879, -6.91039695800782e-05), b0001 = c(0.01800092688004,
0.02011469070317, 0.02028566151573, 0.0179124206973, 0.01941521590629,
0.01846852368848, 0.01564274610647, 0.01763088342656, 0.01782806190528,
0.01595252071591, 0.02045645453128, 0.01934534979926, 0.01892079941012,
0.01859074925398, 0.01759062061265, 0.01593436604294, 0.01809677718956,
0.01698907749719, 0.01956008653302, 0.01292521854622, 0.01781296155008,
0.01589757045382, 0.01700943508274, 0.01673888527351, 0.01999578814362,
0.01689588730579, 0.01474826635901, 0.01762957227617, 0.01844433337313,
0.01426185254875, 0.01647358935637, 0.01852101980912, 0.01705020482026,
0.01867477359887, 0.01474757340631, 0.01722894148788, 0.01746963005864,
0.01632960496522, 0.01466473971168, 0.01463956672595, 0.01772861915606,
0.01702957873434, 0.01740538934663, 0.02136003322368, 0.02565897334663,
0.01291107725161, 0.01753092898439, 0.01687893043972, 0.01409828681218,
0.01588293753652, 0.01540482711573), region = structure(c(3L,
4L, 4L, 3L, 4L, 4L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 1L, 1L, 1L, 1L,
3L, 3L, 2L, 3L, 2L, 1L, 1L, 3L, 1L, 4L, 1L, 4L, 2L, 2L, 4L, 2L,
3L, 1L, 1L, 3L, 4L, 2L, 2L, 3L, 1L, 3L, 3L, 4L, 2L, 3L, 4L, 3L,
1L, 4L), .Label = c("MW", "NE", "SE", "W"), class = "factor")), .Names = c("Ndifference",
"d1011", "netm1011", "b0001", "region"), class = "data.frame", row.names = c(NA,
-51L))
Upvotes: 2
Views: 309
Reputation: 226577
Do you have a particular reason for using Gamma with an identity link (family=Gamma(link="identity")
)? That's probably the proximal cause of your problem. Using Gamma with a log link (family(Gamma(link="log"))
) seems to work fine. (As is often the case, fitting a log-Normal model, i.e. log-transforming the response variable and using lmer
rather than glmer
, is even faster and more robust.)
In general, using link functions that do not constrain the response to be in the domain of the chosen family is numerically unstable, because it is easy for the model to wander into infeasible regions. Translating that out of statistical jargon, if you're going to assume the data are Gamma-distributed, then you may have trouble if, on the way to the optimal solution, the program tries out values that lead to negative mean estimates. In contrast to the identity link (or the canonical inverse link for the Gamma family), the log link forces all the predictions to be positive.
lme4
is admittedly more fragile than it could be about this, but I don't see any particular reason to assume your responses are Gamma distributed with an identity link ...
Based on what I'm seeing from the model predictions and diagnostics, it might be probably be worth dropping the negative value from your data set - forcing it to be (just barely) positive is making it an outlier on the log scale ...
library(lme4)
m1 <- lmer(log(Ndifference+74337) ~ netm1011 + d1011 +
b0001 + (1|region), data=dd)
m2 <- glmer(Ndifference+74337 ~ netm1011 + d1011 +
b0001 + (1|region), data=dd,
family=Gamma(link="log"))
Comparing pred and obs (ignoring the outlier):
pairs(cbind(obs=log(dd$Ndifference+74337),
lognorm=predict(m1),gamma=predict(m2)),gap=0,
xlim=c(10,15),ylim=c(10,15))
Upvotes: 2