Reputation: 71
I have data which looks like this:
dat <- data.frame(ID = rep(1:4, each = 4),
score = c(0, 0, -3, -5, 0, -4, -4, -4, -1, -1, -2, -3, 0, 1, -2, -2),
visit1 = c(1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0),
visit2 = c(0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0),
visit4 = c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1),
visit = c(v1, v1, v1, v1, v2, v2, v2, v2, v3, v3, v3, v3, v4, v4, v4, v4),
trt = c(A, A, A, A, A, A, A, A, P, P, P, P, P, P, P, P))
Now, I fitted a mixed model repeated measures using the mmrm package (https://cran.r-project.org/web/packages/mmrm/index.html) as follows:
fit <- mmrm(formula = score ~ visit1 + visit2 + visit4 + visit2:trt + visit4:trt + us(ID | visit), data = dat)
I am interested in the LS means for the treatment groups at the last visit (visit4/V4) and the contrast of this. With the following model fit2 <- mmrm(formula = score ~ visit + visit:trt + us(ID | visit), data = dat)
I can simply do emmeans(m1, specs = pairwise ~ treat | visit)
. However, if I try to do this for the case where I use the 0/1 variables for visit (visit1, visit2, and visit4) I get an error: Error in h_df_1d_list(est = est, var = var, v_num = v_num, v_denom = v_denom) : Assertion on 'var' failed: Element 1 is not >= 2.22507e-308.
I have two questions about this all:
Upvotes: 1
Views: 500
Reputation: 6810
This is a pretty long answer but maybe it will help pin down what are the issues with hard-coded dummy variables.
emmeans is designed in a way that you really should code all factors
as class factor
(or ordinal
). For those factors that you want
marginal means for, you have no choice about this. However, for those
who insist on manually coding factors using dummy variables, emmeans
will work with those, but not in the same way. Here, I’ll work through
an illustrative example to show the difference. The following code
simulates a somewhat unbalanced dataset with factors treat
(3 levels)
and dose
(4 levels).
set.seed(24.0306)
dat = expand.grid(treat = factor(LETTERS[1:3]), dose = factor(0:3))
dat = rbind(dat[1:3, ], dat) # more observations for dose 0
dat = dat[-sample(1:15, 2), ] # "lose" a couple of observations
### frequencies of factor combinations
with(dat, table(treat, dose))
## dose
## treat 0 1 2 3
## A 2 0 1 1
## B 2 1 1 1
## C 1 1 1 1
teff = c(-4, 7, -3) # true treat effects
deff = c(-6, -2, 3, 5) # true dose effects
dat$y = apply(cbind(as.numeric(dat$treat), as.numeric(dat$dose)), 1, function(idx) {
34 + teff[idx[1]] + deff[idx[2]] + .6 * rnorm(1)
})
Now we will fit two models, one with dose
as a factor and one with
dose
manually coded with the same dummy variables:
dat = transform(dat,
d0 = 0 + (dose == "0"),
d1 = 0 + (dose == "1"),
d2 = 0 + (dose == "2"),
d3 = 0 + (dose == "3"))
options(contrasts = c("contr.treatment", "contr.poly"))
mod.fac = lm(y ~ treat + dose, dat)
mod.dum = lm(y ~ treat + d1 + d2 + d3, dat)
We used exactly the same coding, so the coefficients are the same:
coef(mod.fac)
## (Intercept) treatB treatC dose1 dose2 dose3
## 24.2031040 11.1261777 0.6237625 3.9153530 8.4476560 10.5616421
coef(mod.dum)
## (Intercept) treatB treatC d1 d2 d3
## 24.2031040 11.1261777 0.6237625 3.9153530 8.4476560 10.5616421
Now compare the EMMs
library(emmeans)
(emm.fac = emmeans(mod.fac, "treat"))
## treat emmean SE df lower.CL upper.CL
## A 29.9 0.215 7 29.4 30.4
## B 41.1 0.179 7 40.6 41.5
## C 30.6 0.197 7 30.1 31.0
##
## Results are averaged over the levels of: dose
## Confidence level used: 0.95
(emm.dum = emmeans(mod.dum, "treat"))
## treat emmean SE df lower.CL upper.CL
## A 35.7 0.316 7 34.9 36.4
## B 46.8 0.273 7 46.1 47.4
## C 36.3 0.263 7 35.7 36.9
##
## Results are averaged over the levels of: d1, d2, d3
## Confidence level used: 0.95
The EMMs are not the same. What gives? Well, I hope you remember that EMMs are marginal averages of a reference grid, so let’s look at those:
ref_grid(mod.fac)
## 'emmGrid' object with variables:
## treat = A, B, C
## dose = 0, 1, 2, 3
ref_grid(mod.dum)
## 'emmGrid' object with variables:
## treat = A, B, C
## d1 = 0, 1
## d2 = 0, 1
## d3 = 0, 1
Hmmmm. So each one of the dummy variables is by default regarded as a factor having two levels. This means that your manual dummy coding makes this look like a four-factor experiment instead of a two-factor one. That makes a difference!
Another way to see what’s happening is to obtain the linear functions
actually being computed. This information is in the linfct
slot of the
object created by emmeans()
. This slot comprises a matrix, and each
row of the matrix contains the weights that are applied to each
regression coefficient. Recall that the regression coefficients
themselves are the same for the two models, so if we were to get the
same linear functions, we’ll get the same EMMs. Here they are:
emm.fac@linfct
## (Intercept) treatB treatC dose1 dose2 dose3
## [1,] 1 0 0 0.25 0.25 0.25
## [2,] 1 1 0 0.25 0.25 0.25
## [3,] 1 0 1 0.25 0.25 0.25
emm.dum@linfct
## (Intercept) treatB treatC d1 d2 d3
## [1,] 1 0 0 0.5 0.5 0.5
## [2,] 1 1 0 0.5 0.5 0.5
## [3,] 1 0 1 0.5 0.5 0.5
Interesting. So each dose
coefficient gets twice the weight for
mod.dum
as for mod.dum
, while both give the same weight to the
intercept and treat
coefficients. The dummy coding is going to produce wrong answers.
There’s a third interesting possibility. As we saw, emmeans by default views any covariate having just two different values as a factor. This is a convenience because often we want indicators to be viewed as factors. But we can force them to be treated as covariates instead:
ref_grid(mod.dum, cov.keep = "")
## 'emmGrid' object with variables:
## treat = A, B, C
## d1 = 0.15385
## d2 = 0.23077
## d3 = 0.23077
With this setup, the EMMs are not really averages anymore; they are
predictions for the three treatments with the covariates d1
, d2
, and
d3
set to the above values, leading to a third set of EMMs
(emm.dum2 = emmeans(mod.dum, "treat", cov.keep = ""))
## treat emmean SE df lower.CL upper.CL
## A 29.2 0.203 7 28.7 29.7
## B 40.3 0.177 7 39.9 40.7
## C 29.8 0.200 7 29.3 30.3
##
## Confidence level used: 0.95
emm.dum2@linfct
## (Intercept) treatB treatC d1 d2 d3
## [1,] 1 0 0 0.1538462 0.2307692 0.2307692
## [2,] 1 1 0 0.1538462 0.2307692 0.2307692
## [3,] 1 0 1 0.1538462 0.2307692 0.2307692
Now since the dummies are just 0s and 1s, the values of these in the
reference grid are just the fractions of each that are equal to 1. So
this suggests how we can get the above results from mod.fac
:
emmeans(mod.fac, "treat", weights = "prop")
## treat emmean SE df lower.CL upper.CL
## A 29.2 0.203 7 28.7 29.7
## B 40.3 0.177 7 39.9 40.7
## C 29.8 0.200 7 29.3 30.3
##
## Results are averaged over the levels of: dose
## Confidence level used: 0.95
That is, manually coded dummies, when treated as covariates, yield proportionally-weighted EMMs instead of the default equally-weighted ones. We can turn this around by modifying what covariate values to use with the dummy coding so that we get the equally-weighted EMMs:
emmeans(mod.dum, "treat", cov.keep = "", cov.reduce = function(x) 0.25)
## treat emmean SE df lower.CL upper.CL
## A 29.9 0.215 7 29.4 30.4
## B 41.1 0.179 7 40.6 41.5
## C 30.6 0.197 7 30.1 31.0
##
## Confidence level used: 0.95
Aha! We have come full circle, as this is the first set of EMMs we obtained from mod.fac
.
All this shows that:
cov.keep = ""
with dummy coding, you will obtain
proportionately-weighted EMMs, provided that your manual coding
corresponds to contr.treatment
coding.All in all, life is really much simpler, in my opinion, if you avoid using manualy-created dummies.
What happens if you use contr.sum
dummies: e1 = d1 - d0
,
e2 = d2 - d0
, e3 = d3 - d0
? Hint: In a way, this is simpler
because you don’t need to specify cov.keep
, as these dummies have
three different values and will be treated as covariates from the
get-go.
Upvotes: 2
Reputation: 125
Thanks for using the mmrm
package!
Feel free to post questions or problems also on our GitHub issues at https://github.com/openpharma/mmrm/issues. We will see the questions there much faster than here (I just stumbled across this post accidentally).
Regarding your question, I tried to make this into a full reproducible example, see below. (Note that in the model formula you most likely don't want the 0/1 dummy variables but instead the factor visit
interaction with trt
)
v1 <- "visit1"
v2 <- "visit2"
v3 <- "visit3"
v4 <- "visit4"
A <- "Active"
P <- "Placebo"
dat <- data.frame(ID = rep(1:4, each = 4),
score = c(0, 0, -3, -5, 0, -4, -4, -4, -1, -1, -2, -3, 0, 1, -2, -2),
visit1 = c(1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0),
visit2 = c(0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0),
visit4 = c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1),
visit = c(v1, v1, v1, v1, v2, v2, v2, v2, v3, v3, v3, v3, v4, v4, v4, v4),
trt = c(A, A, A, A, A, A, A, A, P, P, P, P, P, P, P, P))
library(mmrm)
fit <- mmrm(formula = score ~ visit:trt + us(ID | visit), data = dat)
However, the problem that is detected by the most recent (0.2.2) mmrm
package version is shown then with the following error message:
time points have to be unique for each subject, detected following duplicates in data:
visit ID
2 visit1 1
3 visit1 1
4 visit1 1
6 visit2 2
7 visit2 2
8 visit2 2
10 visit3 3
11 visit3 3
12 visit3 3
14 visit4 4
15 visit4 4
16 visit4 4
In general, the MMRM only makes sense (and is allowed by the package) when you have unique visits (time points) per individual. The rationale is that if you had duplicate time points (like twice visit 1) it is not clear how the correlation between these should be modelled.
So, bottom line, I think the mmrm
fit itself is already erroneous here. In addition the emmeans
comments from Russ can be helpful, thanks a lot!
Does this help?
Upvotes: 1
Reputation: 6810
This fails because there is no predictor named visit
in your model.
emmeans()
will not work with dummy variables as predictors. The design of emmeans()
is to use a reference grid for all combinations of predictor values, and it makes no sense to make a prediction for visit1 = 1
and visit2 = 1
, for example.
Re-fit your model with factors as predictors. Or with score
as a multivariate response. I don't know what mmrm()
requires, but if it doesn't support one of those possibilities, it should.
Upvotes: 1