dixi
dixi

Reputation: 710

Which MICE imputed data set to use in succeeding analysis?

I am not sure if I need to provide a reproducible output for this as this is more of a general question. Anyway, after running the mice package, it returns m multiple imputed dataset. We can extract the data by using the complete() function.

I am confuse however which dataset shall I used for my succeeding analysis (descriptive estimation, model building, etc).

Questions: 1. Do I need to extract specific dataset e.g. complete(imp,1)? or shall I use the whole imputed dataset e.g. complete(imp, "long", inc = TRUE)?

  1. If it is the latter complete(imp, "long", inc = TRUE), how do I compute some descriptives like mean, proportion,etc? For example, I will analyze the long data using SPSS. Shall I split the data according to the m number of imputed dataset and manually find the average? is that how it should be done?

Thanks for your help.

Upvotes: 1

Views: 1095

Answers (2)

Ahmadov
Ahmadov

Reputation: 1607

Multiple imputation is comprised of the following three steps:

1. Imputation
2. Analysis
3. Pooling

In the first step, m number of imputed datasets are generated, in the second step data analysis, such as regression is applied to each dataset separately. Finally, in the thirds step, the analysis results are pooled into a final result. There are various pooling techniques implemented for different parameters. Here is a nice link describing the pooling in detail - mice Vignettes

Upvotes: 0

John
John

Reputation: 46

You should run your statistical analysis on each of the m imputed data sets individually, then pool the results together. This allows you to take into account the additional uncertainty introduced by the imputation procedure. MICE has this functionality built in. For example, if you wanted to do a simple linear model you would do this:

fit <- with(imp, lm(y ~ x1 + x2))
est <- pool(fit)
summary(est)

Check out ?pool and ?mira

Upvotes: 3

Related Questions