llewmills
llewmills

Reputation: 3568

Missing values in lmer vs gls

I have a longitudinal dataset that I am performing an HLM analysis on using lmer in lme4. I would like to compare the results from this analysis to the results on the same data but using glsin the nlme package.

There are multiple measures for each participant in the dataset and several of the participants have values missing at one or more timepoints.

lmer does not seem to have a problem with this, but when I ran the same analysis using gls I got an error message

Error in na.fail.default(list(id = c(1001L, 1002L, 1003L, 1004L, 1005L,  : 
  missing values in object

So I have two questions

(1) how does lmer deal with the missing values?

(2) why does gls require 0 missing values when lmerseems to have no issue with NAs? I would rather not lose all that power by being forced to exclude all those participants who have missing data, so if there is some way to specify the same method of treating missing values in lmer except in gls that would be ideal. (otherwise multiple imputation I suppose?)

Upvotes: 2

Views: 5298

Answers (1)

Niek
Niek

Reputation: 1624

The default na.action of lmer is na.omit, which means that any rows that have missing values on one or more variables in the model are removed from the dataset. The default action ins gls is na.fail which is why gls gives an error message when any variables in the model have missing values. Regardless of which function you use the power issue remains. You could specify the same method of handling missing data by typing gls(....,na.action = na.omit) but in both cases you're excluding rows with missing data.

Since you have longitudinal data excluding rows with missing data doesn't necessarily equate to excluding participants (but might mean that you're excluding some observations of some participants) and according to Snijder and Bosker (2012) does not lead to biased estimates assuming data are missing at random (MAR). I would start by examining any patterns in the missing data that could lead to bias e.g. because the variables that are related to the missing data mechanism are not included in the model. Multiple imputation can be an option but (depending on the circumstances) often does little to alleviate power issues.

Upvotes: 2

Related Questions