Reputation: 125
I'm trying to make a multilevel mediation analysis (as done here and here).
library(data.table)
library(lme4)
library(nlme)
library(magrittr)
library(dplyr)
set.seed(1)
# Simulated data ------------------------------------------------------------------
dt_1 <- data.table(id = rep(1:10, each=4),
time = as.factor(rep(1:4, 10)),
x = rnorm(40),
m = rnorm(40),
y = rnorm(40))
# Melt m and y into z ------------------------------------------------------------------
dt_2 <- dt_1 %>%
mutate(mm = m, .after=x) %>%
melt(id.vars = c("id","time","x","mm"),
na.rm = F,
variable.name = "dv",
value.name = "z") %>%
within({
dy <- as.integer(dv == "y")
dm <- as.integer(dv == "m")
}) %>%
arrange(id,time)
> head(dt_2,4)
id time x mm dv z dm dy
1: 1 1 -0.6264538 -0.1645236 m -0.1645236 1 0
2: 1 1 -0.6264538 -0.1645236 y -0.5686687 0 1
3: 1 2 0.1836433 -0.2533617 m -0.2533617 1 0
4: 1 2 0.1836433 -0.2533617 y -0.1351786 0 1
# lme mediation model ------------------------------------------------------------------
model_lme <- lme(fixed = z ~ 0 +
dm + dm:x + dm:time + #m as outcome
dy + dy:mm + dy:x + dy:time, #y as outcome
random = ~ 0 + dm:x + dy:mm + dy:x | id,
weights = varIdent(form = ~ 1 | dm), #separate sigma^{2}_{e} for each outcome
data = dt_2,
na.action = na.exclude)
Error in MEEM(object, conLin, control$niterEM): Singularity in backsolve at level 0, block 1
# lmer mediation model ------------------------------------------------------------------
model_lmer <- lmer(z ~ 0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time +
(0 + dm:x + dy:mm + dy:x | id) + (0 + dm | time),
data = dt_2,
na.action = na.exclude)
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
I've seen some posts about these error (nlme) / warning (lme4) (eg, this and this), but I didn't figure out what is the problem here.
I checked
X <- model.matrix(~0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time, data=dt_2)
> caret::findLinearCombos(X)
$linearCombos
$linearCombos[[1]]
[1] 7 1 4 5 6
$remove
[1] 7
but I don't quite understand the output.
From the summary of model_lmer
I verify that dm:time4
and time1:dy
coefficients are missing, but why? There are all the possible combinations (0/0, 0/1, 1/0, 1/1) in the dataset.
Fixed effects:
Estimate Std. Error t value
dm 0.30898 0.92355 0.335
dy 0.03397 0.27480 0.124
dm:x 0.21267 0.19138 1.111
dm:time1 -0.19713 1.30583 -0.151
dm:time2 -0.30206 1.30544 -0.231
dm:time3 -0.20828 1.30620 -0.159
dy:mm 0.22625 0.18728 1.208
x:dy -0.37747 0.17130 -2.204
time2:dy 0.29894 0.39123 0.764
time3:dy 0.22640 0.39609 0.572
time4:dy -0.16758 0.39457 -0.425
On the other hand, using time
as numeric gives no error/warning:
# lmer mediation model - time as numeric
model_lmer2 <- lmer(z ~ 0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time +
(0 + dm:x + dy:mm + dy:x | id) + (0 + dm | time),
data = within(dt_2, time <- as.numeric(time)),
na.action = na.exclude)
It's true that one can know dm
from dy
(if one is 1 the other is 0), but if that was the problem, this last model (model_lmer2
) would still give a warning, I guess.
In my real dataset, I could eventually use time
as numeric (although not my first approach), but I would like to understand what the problem is with using it as categorical.
Thank you!
Upvotes: 0
Views: 441
Reputation: 226712
This is really a generic question about linear model construction/formulas in R: it's not mixed-model specific.
Let's look at the names of the columns involved in the multicollinear combination of variables (i.e. columns 7, 1, 4, 5, 6):
cc <- caret::findLinearCombos(X)
colnames(X)[cc$linearCombos[[1]]]
## [1] "dm:time4" "dm" "dm:time1" "dm:time2" "dm:time3"
This is telling us that the main effect of dm
is confounded with the dm:time
interaction; once we know dm:time[i]
for all levels of i
, the main effect of dm
is redundant.
It's too bad that lme
doesn't automatically drop columns to adjust for multicollinearity as lmer
does, and that lmer
doesn't have a super-convenient way to model heteroscedasticity à la varIdent()
; it is possible but it's a nuisance. It would be possible to build the auto-dropping into nlme
, or glmmTMB
(which can also model heteroscedasticity easily), but no-one's gotten around to it yet).
... if you're OK with specifying dm:time
and leaving dm
out of your model then that might be easiest!
You can experiment with what happens with different model specifications:
lc <- function(f) {
X <- model.matrix(f, dt_2)
cc <- caret::findLinearCombos(X)
lapply(cc$linearCombos, function(z) colnames(X)[z])
}
lc(~0 + dm + dm:time)
lc(~0 + dy + dy:time)
lc(~0 + dm + dm:time + dy + dy:time)
lc(~0 + dy + dy:time + dm + dm:time)
or similar stuff looking at the (heads of) the model matrices, column names of the model matrices, etc..
Upvotes: 1