Reputation: 65
I have data, where "speed of flight" is a response variable and group
(experimental/control), test
(first/second), FL
(fuel loads, % from lean body mass: from 0 to ~25%), wing
(wing length in mm). Since we have tested same birds twice (first and second test, experimental group was infected), I want to perform the mixed model (add a random term ~1|ring
). I also added the weight
parameter for the test
variable because of heteroscedasticity.
mod<-lme(speed~test* group * FL * wing,weight=~1|test,random=~1|ring,data=data,method="ML")
This is how the full model looks like (I use nlme
package). After that I start the backward selection. I do it manually (according to the lowest AIC) and then check the result with a function stepAIC
(MASS package). In this case first two steps of selection are well, but when I start with the model:
mod3<-lme(speed~test+group + FL + wing+ test:group + group:FL + FL:wing + test:group:wing, weight=~1|test,random=~1|ring,data=data,method="ML")
I got an error:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
As far as I understand, it means that not all interactions of factors exist. But then I should have got the same error already with the full model. And with other response variables it works well. If any of you have an idea, I would be glad!
original data
ring group wing speed_aver FL test
1 XZ13125 E 75 0.62 16.2950000 first
2 XZ13125 E 75 0.22 12.5470149 second
3 XZ13126 E 68 0.39 7.7214876 first
4 XZ13127 C 75 0.52 9.1573643 first
5 XZ13127 C 75 0.17 -1.9017391 second
6 XZ13129 C 73 0.46 10.3821705 first
7 XZ13129 C 73 0.33 -0.5278261 second
8 XZ13140 C 73 0.48 13.0774436 first
9 XZ13140 C 73 0.27 18.0092199 second
10 XZ13144 C 73 0.36 7.5144000 first
11 XZ13144 C 73 0.36 9.6820312 second
12 XZ13146 E 73 0.32 14.3651852 first
13 XZ13146 E 73 0.28 20.8171233 second
14 XZ13159 C 74 0.55 20.2760274 first
15 XZ13159 C 74 0.37 19.1687500 second
16 XZ13209 E 72 0.35 8.1464000 first
17 XZ13209 E 72 0.43 10.9945736 second
18 XZ13213 E 74 0.57 5.3682927 first
19 XZ13213 E 74 0.26 1.3584746 second
20 XZ13220 C 73 0.30 6.0105691 first
21 XZ13220 C 73 0.36 -8.0439252 second
22 XZ13230 E 74 0.44 5.3682927 first
23 XZ13230 E 74 0.31 3.0025000 second
24 XZ13231 C 75 0.28 6.2504000 first
25 XZ13231 C 75 0.37 7.7267717 second
26 XZ13232 C 74 0.34 16.8592857 first
27 XZ13232 C 74 0.33 13.7800000 second
28 XZ13271 C 73 0.32 16.2268116 first
29 XZ13271 C 73 0.28 14.3651852 second
30 XZ13278 E 72 0.45 15.5757353 first
31 XZ13278 E 72 0.37 14.9503704 second
32 XZ13280 C 74 0.33 15.0386861 first
33 XZ13280 C 74 0.36 7.6214286 second
34 XZ13340 E 73 0.62 16.8294964 first
35 XZ13340 E 73 0.26 13.7261194 second
36 XZ13367 E 75 0.42 23.4071895 first
37 XZ13370 E 71 0.25 13.6159091 first
Upvotes: 2
Views: 11875
Reputation: 226192
This is pretty tricky as it turns out. I think the problem is that due to the way you're constructing your second formula, R is not automatically removing collinear variables from the model matrix.
tl;dr this is a bit stream-of-consciousness, but I think the basic take-home points are
lme
doesn't necessarily check/handle aliasing in a model specification for you (unlike lm
, or to a lesser extent lmer
)test:group:wing
interaction without including the group:wing
and test:wing
interactions. R lets you do this, but the model doesn't necessarily make sense ... I'm a little bit surprised you ended up with this model specification -- usually stepAIC
, and drop1
, and R's other built-in model simplification tools, try to respect marginality and thus wouldn't let you end up here ...lmer
(although dealing with heteroscedasticity is harder), or construct your own numeric dummy variables with model.matrix()
...model.matrix()
, outside the scope of the model-fitting (lm
/lme
/lmer
) function itself ...For simplicity I'm going to leave out the variance model (weights=varIdent(form=~1|test)
) as it doesn't seem to be relevant to this specific problem (I didn't know that a priori, but tests with and without it didn't differ).
library("nlme")
form1 <- speed_aver~test* group * FL * wing
form2 <- speed_aver~test+group + FL + wing+
test:group + group:FL + FL:wing +
test:group:wing
mod <- lme(form1,random=~1|ring,data=dd,method="ML") ## OK
update(mod,form2)
## fails with "Singularity in backsolve" error
What if we try it with lme4
?
## ugh, I wish I knew a better way to append to a formula
form1L <- formula(paste(deparse(form1),"(1|ring)",sep="+"))
form2L <- formula(paste(deparse(form2,width=100),"(1|ring)",sep="+"))
library("lme4")
mod2 <- lmer(form1L, data=dd)
mod3 <- lmer(form2L, data=dd)
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
Aha! lmer
automatically detects that the model matrix is rank-deficient. lm
does this automatically and substitutes NA
values for the aliased terms. At present lmer
just drops them, although with reasonably recent versions of lme4
the (documented but unadvertised) option add.dropped=TRUE
to fixef()
will put the NA
values back in the appropriate places.
So let's investigate the model matrices:
X0 <- model.matrix(form1,data=dd)
c(rankMatrix(X0)==ncol(X0)) ## TRUE; both are 16
X <- model.matrix(form2,data=dd)
c(rankMatrix(X))==ncol(X) ## FALSE; 11<12
Try to identify aliased columns: 12th element of svd(X)$d
is tiny (1e-15)
ss <- svd(X)
(zz <- zapsmall(ss$v[,12])) ## elements of collinear grouping
## [1] 0.0000000 0.0000000 0.0000000 0.0000000 -0.4472136 0.0000000
## [7] 0.0000000 0.0000000 0.4472136 0.4472136 0.4472136 0.4472136
So the sum of columns 9-12 is exactly the same as column 5 (same values, oppositite signs). What's going on here?
colnames(X)[zz!=0]
## [1] "wing" "testfirst:groupC:wing" "testsecond:groupC:wing"
## [4] "testfirst:groupE:wing" "testsecond:groupE:wing"
It looks like we somehow got all of the levels of the test-by-group interaction interacting with wing, along with the wing
variable itself ...
mm <- X[,zz!=0]
colnames(mm) <- gsub("(test|group|:wing)","",colnames(mm))
head(mm)
## wing first:C second:C first:E second:E
## 1 75 0 0 75 0
## 2 75 0 0 0 75
## 3 68 0 0 68 0
## 4 75 75 0 0 0
## 5 75 0 75 0 0
## 6 73 73 0 0 0
I'm still not 100% sure why this happens, but you can see that R expands the three-way interaction include all four levels of the two-way interaction (which in turn interact with the continuous wing
variable), but it's also got wing
colnames(X)
## [1] "(Intercept)" "testsecond" "groupE"
## [4] "FL" "wing" "testsecond:groupE"
## [7] "groupE:FL" "FL:wing" "testfirst:groupC:wing"
## [10] "testsecond:groupC:wing" "testfirst:groupE:wing"
## "testsecond:groupE:wing"
colnames(X0)
## [1] "(Intercept)" "testsecond"
## [3] "groupE" "FL"
## [5] "wing" "testsecond:groupE"
## [7] "testsecond:FL" "groupE:FL"
## [9] "testsecond:wing" "groupE:wing"
## [11] "FL:wing" "testsecond:groupE:FL"
## [13] "testsecond:groupE:wing" "testsecond:FL:wing"
## [15] "groupE:FL:wing" "testsecond:groupE:FL:wing"
If we define a model that respects marginality, then we're OK again ...
form3 <- speed_aver~test*group*wing+FL*(group+wing)
X1 <- model.matrix(form3,dd)
c(rankMatrix(X1)== ncol(X1)) ## TRUE
And we can replicate the problem more simply this way:
form4 <- speed_aver~wing+test:group:wing
X2 <- model.matrix(form4,dd)
c(rankMatrix(X2)== ncol(X2)) ## FALSE
this model has the three-way interaction (explicitly), but is missing the two-way interaction. If we used ~wing*test*group
, or even ~wing+wing*test*group
, we would be OK ...
form5 <- speed_aver~wing+test*group*wing
X3 <- model.matrix(form5,dd)
c(rankMatrix(X3)== ncol(X3)) ## TRUE
Upvotes: 10