Reputation: 125
I am running a linear mixed-effects model in R, and I'm not sure how to include a covariate of no interest in the model, or even how to decide if I should do that.
I have two within-subject variables, let's call them A and B with two levels each, with lots of observations per participant. I'm interested in how their interaction changes across 4 groups. My outcome is reaction time. At the simplest level, I have this model:
RT ~ 1 + A*B*Groups + (1+A | Subject ID)
I would like to add Gender as a covariate of no interest. I have no theoretical reason to assume it affects anything, but it's really imbalanced across groups, so I'd like to include it. The first part of my question is: What is the best way to do this?
Is it this model:
RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID)
or this:
RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID)
? Or some other way? My worries about this second model is that it somewhat unreasonably inflates the number of terms in the model. Plus I'm worried about overfitting.
The second part of my question: When selecting the best model, when should I add the covariate to see if it makes any difference at all? Let me explain what I mean.
Let's say I start with the simplest model I mentioned above, but without the slope for A, so this:
RT ~ 1 + A*B*Groups + (1| Subject ID)
Should I add the covariate first, either as a main effect ( + Gender) or as part of the interaction (*Gender), and then see if adding a slope for A makes a difference (by using the anova() function), or can I go ahead with adding the slope (which is theoretically more important) first, and then see if gender matters at all?
Upvotes: 0
Views: 3458
Reputation: 50728
Following are some suggestions regarding your two questions.
I would recommend an iterative modelling strategy.
Start with
RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID)
and see if the problem is tractable. Above model will include both additive effects as well as all interaction terms between A
, B
, Groups
and Gender
.
If the problem is not tractable, discard the interaction terms between Gender
and the other covariates, and model
RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID)
It's difficult to make a statement about potential overfitting without any details on the number of observations.
Concerning your second question: Generally, I would recommend a Bayesian approach; take a look at the rstan
-based brms
R package, which allows you to use the same lme4
/glmm
formula syntax, making it easy to translate models. Model comparison and predictive performance are very broad terms. There exist various ways to explore and compare the predictive performance of these type of nested/hierarchical Bayesian models. See for example the papers by Piironi and Vehtari and Vehtari and Ojanen.
Upvotes: 1