How to decide when and how to include covariates in a linear mixed-effects model in lme4

Question

I am running a linear mixed-effects model in R, and I'm not sure how to include a covariate of no interest in the model, or even how to decide if I should do that.

I have two within-subject variables, let's call them A and B with two levels each, with lots of observations per participant. I'm interested in how their interaction changes across 4 groups. My outcome is reaction time. At the simplest level, I have this model:

RT ~ 1 + A*B*Groups + (1+A | Subject ID)

I would like to add Gender as a covariate of no interest. I have no theoretical reason to assume it affects anything, but it's really imbalanced across groups, so I'd like to include it. The first part of my question is: What is the best way to do this?

Is it this model:

RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID)

or this:

RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID)

? Or some other way? My worries about this second model is that it somewhat unreasonably inflates the number of terms in the model. Plus I'm worried about overfitting.

The second part of my question: When selecting the best model, when should I add the covariate to see if it makes any difference at all? Let me explain what I mean.

Let's say I start with the simplest model I mentioned above, but without the slope for A, so this:

RT ~ 1 + A*B*Groups + (1| Subject ID)

Should I add the covariate first, either as a main effect ( + Gender) or as part of the interaction (*Gender), and then see if adding a slope for A makes a difference (by using the anova() function), or can I go ahead with adding the slope (which is theoretically more important) first, and then see if gender matters at all?

How to decide when and how to include covariates in a linear mixed-effects model in lme4

Answers (1)

Related Questions