Marta Fernandez
Marta Fernandez

Reputation: 11

Regression model with missing data in dependant variable

modelo <- lm( P3J_IOP~ PräOP_IOP +OPTyp + P3J_Med, data = na.omit(df))
summary(modelo)

Error:

Fehler in step(modelo, direction = "backward") : Number of lines used has changed: remove missing values?

I have a lot of missing values in my dependent variable P3J_IOP.

Has anyone any idea how to create the model?

Upvotes: 1

Views: 1096

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226097

tl;dr unfortunately, this is going to be hard.

It is fairly difficult to make linear regression work smoothly with missing values in the predictors/dependent variables (this is true of most statistical modeling approaches, with the exception of random forests). In case it's not clear, the problem with stepwise approaches with missing data in the predictor is:

  • incomplete cases (i.e., observations with missing data for any of the current set of predictors) must be dropped in order to fit a linear model;
  • models with different predictor sets will typically have different sets of incomplete cases, leading to the models being fitted on different subsets of the data;
  • models fitted to different data sets aren't easily comparable.

You basically have the following choices:

  • drop any predictors with large numbers of missing values, then drop all cases that have missing values in any of the remaining predictors;
  • use some form of imputation, e.g. with the mice package, to fill in your missing data (in order to do proper statistical inference, you need to do multiple imputation, which may be hard to combine with stepwise regression).

There are some advanced statistical techniques that will allow you to simultaneously do the imputation and the modeling, such as the brms package (here is some documentation on imputation with brms), but it's a pretty big hammer/jump in statistical sophistication if all you want to do is fit a linear model to your data ...

Upvotes: 2

Related Questions