MDStat
MDStat

Reputation: 435

Imputation of specific columns with mice()

I would like to use data imputation by using the mice package. My dataset contains the columns "A" to "G", but I only want to impute the values of column C and D.

In this article (https://www.r-bloggers.com/2016/06/handling-missing-data-with-mice-package-a-simple-approach/) it is explained how to exclude variables from being a predictor or being imputed - but I would like to use mice the other way round: I want to specify which variables ARE imputed - so only C and D should be imputed.

Is this possible?

Thank you!

Upvotes: 5

Views: 4074

Answers (1)

slamballais
slamballais

Reputation: 3235

Answer

Just invert the logic: In the methods vector, set every variable that is not one of your variables of interest to "":

meth[!names(meth) %in% c("C", "D")] <- ""

Example: Only impute Petal.Length and Petal.Width

data <- mice::ampute(iris, prop = 0.1)$amp
init <- mice(data, maxit = 0)
meth <- init$meth
meth[!names(meth) %in% c("Petal.Length", "Petal.Width")] <- ""
mice(data, meth = meth)

Rationale

You can supply a vector to the method argument of mice::mice. This vector should contain the methods that you want to use to impute the variables you want to impute. In the example they first do a dry-run (init <- mice(data, maxit = 0)), where the output contains a preset vector for you (init$method). For my example, it looks like this:

Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
       "pmm"        "pmm"        "pmm"        "pmm"        "pmm"

You can avoid variables being imputed by setting the method to "". This is one way to exclude variables. As I show with my example, you can invert that logic, thus ending up with only the variables you want to include.

Upvotes: 4

Related Questions