Reputation: 710
I am not sure if I need to provide a reproducible output for this as this is more of a general question. Anyway, after running the mice package, it returns m
multiple imputed dataset. We can extract the data by using the complete()
function.
I am confuse however which dataset shall I used for my succeeding analysis (descriptive estimation, model building, etc).
Questions:
1. Do I need to extract specific dataset e.g. complete(imp,1)
? or shall I use the whole imputed dataset e.g. complete(imp, "long", inc = TRUE)
?
complete(imp, "long", inc = TRUE)
, how do I compute some descriptives like mean, proportion,etc? For example, I will analyze the long data using SPSS. Shall I split the data according to the m
number of imputed dataset and manually find the average? is that how it should be done?Thanks for your help.
Upvotes: 1
Views: 1095
Reputation: 1607
Multiple imputation is comprised of the following three steps:
1. Imputation
2. Analysis
3. Pooling
In the first step, m
number of imputed datasets are generated, in the second step data analysis, such as regression is applied to each dataset separately. Finally, in the thirds step, the analysis results are pooled into a final result. There are various pooling techniques implemented for different parameters.
Here is a nice link describing the pooling in detail - mice Vignettes
Upvotes: 0
Reputation: 46
You should run your statistical analysis on each of the m
imputed data sets individually, then pool the results together. This allows you to take into account the additional uncertainty introduced by the imputation procedure. MICE has this functionality built in. For example, if you wanted to do a simple linear model you would do this:
fit <- with(imp, lm(y ~ x1 + x2))
est <- pool(fit)
summary(est)
Check out ?pool
and ?mira
Upvotes: 3