Which MICE imputed data set to use in succeeding analysis?

Question

I am not sure if I need to provide a reproducible output for this as this is more of a general question. Anyway, after running the mice package, it returns m multiple imputed dataset. We can extract the data by using the complete() function.

I am confuse however which dataset shall I used for my succeeding analysis (descriptive estimation, model building, etc).

Questions: 1. Do I need to extract specific dataset e.g. complete(imp,1)? or shall I use the whole imputed dataset e.g. complete(imp, "long", inc = TRUE)?

If it is the latter complete(imp, "long", inc = TRUE), how do I compute some descriptives like mean, proportion,etc? For example, I will analyze the long data using SPSS. Shall I split the data according to the m number of imputed dataset and manually find the average? is that how it should be done?

Thanks for your help.

John · Accepted Answer

You should run your statistical analysis on each of the m imputed data sets individually, then pool the results together. This allows you to take into account the additional uncertainty introduced by the imputation procedure. MICE has this functionality built in. For example, if you wanted to do a simple linear model you would do this:

fit <- with(imp, lm(y ~ x1 + x2))
est <- pool(fit)
summary(est)

Check out ?pool and ?mira

Which MICE imputed data set to use in succeeding analysis?

Answers (2)

Related Questions