Reputation: 435
I would like to use data imputation by using the mice package. My dataset contains the columns "A" to "G", but I only want to impute the values of column C and D.
In this article (https://www.r-bloggers.com/2016/06/handling-missing-data-with-mice-package-a-simple-approach/) it is explained how to exclude variables from being a predictor or being imputed - but I would like to use mice the other way round: I want to specify which variables ARE imputed - so only C and D should be imputed.
Is this possible?
Thank you!
Upvotes: 5
Views: 4074
Reputation: 3235
Answer
Just invert the logic: In the methods vector, set every variable that is not one of your variables of interest to ""
:
meth[!names(meth) %in% c("C", "D")] <- ""
Example: Only impute Petal.Length
and Petal.Width
data <- mice::ampute(iris, prop = 0.1)$amp
init <- mice(data, maxit = 0)
meth <- init$meth
meth[!names(meth) %in% c("Petal.Length", "Petal.Width")] <- ""
mice(data, meth = meth)
Rationale
You can supply a vector to the method
argument of mice::mice
. This vector should contain the methods that you want to use to impute the variables you want to impute. In the example they first do a dry-run (init <- mice(data, maxit = 0)
), where the output contains a preset vector for you (init$method
). For my example, it looks like this:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"pmm" "pmm" "pmm" "pmm" "pmm"
You can avoid variables being imputed by setting the method to ""
. This is one way to exclude variables. As I show with my example, you can invert that logic, thus ending up with only the variables you want to include.
Upvotes: 4