user2626597
user2626597

Reputation: 61

How to choose best imputed data using mice

Using mice package I imputed a dataset like:

imp <- mice(nhanes)

It generates 5 imputed datasets for each variables:

imp$imp$bmi
#      1    2    3    4    5
#1  35.3 30.1 26.3 28.7 27.2
#3  30.1 22.0 30.1 28.7 22.0
#4  21.7 27.2 25.5 24.9 21.7
#6  24.9 25.5 24.9 27.5 22.5
#10 20.4 33.2 26.3 27.2 27.4
#11 22.0 27.2 27.2 30.1 22.0
#12 27.4 20.4 27.2 27.2 20.4
#16 30.1 30.1 27.2 22.5 29.6
#21 27.4 27.2 26.3 22.0 30.1

So I do not understand how to choose the best imputed data.

For example for bmi (above) what of 5 columns will be the best choice ?

Thank you

Upvotes: 0

Views: 2271

Answers (2)

Steffen Moritz
Steffen Moritz

Reputation: 7730

The whole concept of mice is that you have multiple imputed datasets.

If you only want 1 imputed dataset you can use Single Imputation packages like missForest, imputeR, VIM which are sometimes a little bit easier to use / understand syntax wise.

The advantage of a Multiple Imputation package like mice is, that you have multiple imputed datasets, which can help account for uncertainties that occur by performing the imputation.

You would not use one of the imputed datasets, instead you would perform your analysis on all 5 (or more) of these datasets.

By doing this, you know how much the results of your analysis can vary. Afterwards you can pool your results. mice helps you along this process.

A typical mice workflow would look like this:

# 1. Perform imputations
imp <- mice(nhanes, maxit = 2, m = 2)

# 2. Create model for all imputed datasets / in this case m = 2
fit <- with(data = imp, exp = lm(bmi ~ hyp + chl))

# 3. Pool the results
pool <- pool(fit)

# Print results
summary(pool)

Upvotes: 2

mmarks
mmarks

Reputation: 1219

There isn't a best dataset. Selecting a single dataset would only account for within dataset variation/error but not the between-imputed-datasets variation.

This is why analysis such as regression should utilise the with and pool commands when working with imputed data.

Upvotes: 1

Related Questions