MDSF
MDSF

Reputation: 125

Error in MEEM(object, conLin, control$niterEM) / fixed-effect model matrix is rank deficient

I'm trying to make a multilevel mediation analysis (as done here and here).

library(data.table)
library(lme4)
library(nlme)
library(magrittr)
library(dplyr)

set.seed(1)

# Simulated data ------------------------------------------------------------------
dt_1 <- data.table(id = rep(1:10, each=4),
                   time = as.factor(rep(1:4, 10)),
                   x = rnorm(40),
                   m = rnorm(40),
                   y = rnorm(40))

# Melt m and y into z ------------------------------------------------------------------
dt_2 <- dt_1 %>%
  mutate(mm = m, .after=x) %>%
  melt(id.vars = c("id","time","x","mm"),
       na.rm = F,
       variable.name = "dv",
       value.name = "z") %>%
  within({
    dy <- as.integer(dv == "y")
    dm <- as.integer(dv == "m")
    }) %>%
  arrange(id,time)

> head(dt_2,4)
   id time          x         mm dv          z dm dy
1:  1    1 -0.6264538 -0.1645236  m -0.1645236  1  0
2:  1    1 -0.6264538 -0.1645236  y -0.5686687  0  1
3:  1    2  0.1836433 -0.2533617  m -0.2533617  1  0
4:  1    2  0.1836433 -0.2533617  y -0.1351786  0  1

# lme mediation model ------------------------------------------------------------------
model_lme <- lme(fixed = z ~ 0 + 
                             dm + dm:x + dm:time + #m as outcome
                             dy + dy:mm + dy:x + dy:time, #y as outcome
                 random = ~ 0  +  dm:x + dy:mm + dy:x | id, 
                 weights = varIdent(form = ~ 1 | dm), #separate sigma^{2}_{e} for each outcome
                 data = dt_2,
                 na.action = na.exclude)

Error in MEEM(object, conLin, control$niterEM): Singularity in backsolve at level 0, block 1

# lmer mediation model ------------------------------------------------------------------
model_lmer <- lmer(z ~ 0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time +
                     (0  +  dm:x + dy:mm + dy:x | id) + (0 + dm | time), 
                  data = dt_2,
                  na.action = na.exclude)

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

I've seen some posts about these error (nlme) / warning (lme4) (eg, this and this), but I didn't figure out what is the problem here.

I checked

X <- model.matrix(~0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time, data=dt_2)

> caret::findLinearCombos(X)
$linearCombos
$linearCombos[[1]]
[1] 7 1 4 5 6

$remove
[1] 7

but I don't quite understand the output.

From the summary of model_lmer I verify that dm:time4 and time1:dy coefficients are missing, but why? There are all the possible combinations (0/0, 0/1, 1/0, 1/1) in the dataset.

Fixed effects:
         Estimate Std. Error t value
dm        0.30898    0.92355   0.335
dy        0.03397    0.27480   0.124
dm:x      0.21267    0.19138   1.111
dm:time1 -0.19713    1.30583  -0.151
dm:time2 -0.30206    1.30544  -0.231
dm:time3 -0.20828    1.30620  -0.159
dy:mm     0.22625    0.18728   1.208
x:dy     -0.37747    0.17130  -2.204
time2:dy  0.29894    0.39123   0.764
time3:dy  0.22640    0.39609   0.572
time4:dy -0.16758    0.39457  -0.425

On the other hand, using time as numeric gives no error/warning:

# lmer mediation model - time as numeric
model_lmer2 <- lmer(z ~ 0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time +
                     (0  +  dm:x + dy:mm + dy:x | id) + (0 + dm | time), 
                  data = within(dt_2, time <- as.numeric(time)),
                  na.action = na.exclude)

It's true that one can know dm from dy (if one is 1 the other is 0), but if that was the problem, this last model (model_lmer2) would still give a warning, I guess.

In my real dataset, I could eventually use time as numeric (although not my first approach), but I would like to understand what the problem is with using it as categorical.

Thank you!

Upvotes: 0

Views: 441

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226712

This is really a generic question about linear model construction/formulas in R: it's not mixed-model specific.

Let's look at the names of the columns involved in the multicollinear combination of variables (i.e. columns 7, 1, 4, 5, 6):

cc <- caret::findLinearCombos(X)
colnames(X)[cc$linearCombos[[1]]]
## [1] "dm:time4" "dm"       "dm:time1" "dm:time2" "dm:time3"

This is telling us that the main effect of dm is confounded with the dm:time interaction; once we know dm:time[i] for all levels of i, the main effect of dm is redundant.

It's too bad that lme doesn't automatically drop columns to adjust for multicollinearity as lmer does, and that lmer doesn't have a super-convenient way to model heteroscedasticity à la varIdent(); it is possible but it's a nuisance. It would be possible to build the auto-dropping into nlme, or glmmTMB (which can also model heteroscedasticity easily), but no-one's gotten around to it yet).

... if you're OK with specifying dm:time and leaving dm out of your model then that might be easiest!


You can experiment with what happens with different model specifications:

lc <- function(f) {
    X <- model.matrix(f, dt_2)
    cc <- caret::findLinearCombos(X)
    lapply(cc$linearCombos, function(z) colnames(X)[z])
}

lc(~0 + dm + dm:time)
lc(~0 + dy + dy:time)
lc(~0 + dm + dm:time + dy + dy:time)
lc(~0 + dy + dy:time + dm + dm:time)

or similar stuff looking at the (heads of) the model matrices, column names of the model matrices, etc..

Upvotes: 1

Related Questions