RyanKao
RyanKao

Reputation: 331

What exactly does complete in mice do?

I am researching how to use multiple imputation results. The following is my understanding, and please let me know if there're mistakes.

Suppose you have a data set with missing values, and you want to conduct a regression analysis. You may perform multiple imputation for m = 5 times, and for each imputed data set (5 imputed data sets now) you run a regression analysis, then "pool" the coefficient estimates from these m = 5 models via Rubin's rules (or use R package "pool").

My question is that, in mice you have a function complete(), and the manual says you can extract completed data set by using complete(object).

But if I use mice for m = 5 times, does it still make sense to use complete()? Which imputation results will complete() get for me?

Also, does it make sense if I only use mice with m = 1? Thank you.

Upvotes: 6

Views: 6050

Answers (2)

jay.sf
jay.sf

Reputation: 73572

You probably overlooked that mice::complete() in arguments uses action=1 as default, which "returns the first imputed data set" (see ?mice::complete) and actually is worthless.

You should definitely use action="long" to take account for the "multiplicity" of the multiple imputation!

No, it makes no sense at all to use m=1 (apart from debugging), because every imputation is based on a random process and you have to pool the results (using any method whatsoever) to account for the variation. Often m > 20 is recommended1,2.

Basically, multiple imputation works as follows:

  1. Create m imputation processes with a random component, to obtain
  2. m slightly different imputed data sets.
  3. Analyze each imputed data set to get slightly different parameter estimates.
  4. Combine results, calculating the variation in parameter estimates.

(Also see multiple-imputation-in-a-nutshell for a brief overview.)

Upvotes: 11

Noah
Noah

Reputation: 4414

When you use mice, you get an object that is not the imputed data set. You cannot perform operations on it directly without using the special functions in mice. If you want to extract that actual imputed datasets, you use complete, the output of which is a data.frame with one row per individual per imputation (if using the "long" format). If you are doing any analysis with your imputed data that cannot be performed within mice, you need to create this dataset first.

Upvotes: 1

Related Questions