Reputation: 3158
I'm using library(mice)
to impute missing data. I want a way to tell mice
that the ID variables should be included on the imputed data set but not used for the imputations.
For instance
#making a silly data frame with missing data
library(tidyverse)
library(magrittr)
library(mice)
d1 <- data.frame(
id = str_c(
letters[1:20] %>%
rep(each = 5),
1:5 %>%
rep(times = 20)
),
v1 = runif(100),
v2 = runif(100),
v3 = runif(100)
)
d1[, -1] %<>%
map(
function(i){
i[extract(sample(1:100, 5, F))] <- NA
i
}
)
This is the returned mids
object
m1 <- d1 %>%
select(-id) %>%
mice
How can I include d1$id
as a variable in in each of the imputed data frames?
Upvotes: 3
Views: 2689
Reputation: 766
Niek's answer is the correct way to go, but I have also noticed that character variables are automatically caught and removed from mice
's predictor matrix on the basis of being "constant" (maybe because mice
sees character vectors as being entirely NA
). As a result, it seems to me that any character-type variables you include in your dataset will be passed through the imputation step without being used as predictors for imputation.
Upvotes: -1
Reputation: 1624
There are two ways. First, simply append id
to the imputed datasets
d2 <- complete(m1,'long', include = T) # imputed datasets in long format (including the original)
d3 <- cbind(d1$id,d2) # as datasets are ordered simply cbind `id`
m2 <- as.mids(d3) # and transform back to mids object
This ensures that id
has no role in the imputation process, but is a bit sloppy and prone to error. Another way is to simply remove it from the predictor matrix.
The 2011 manual by Van Buuren & Groothuis-Oudshoorn says: "The user can specify a custom predictorMatrix, thereby effectively regulating the number of predictors per variable. For example, suppose that bmi is considered irrelevant as a predictor. Setting all entries within the bmi column to zero effectively removes it from the predictor set ... will not use bmi as a predictor, but still impute it."
To do this
ini <- mice(d1,maxit=0) # dry run without iterations to get the predictor matrix
pred1 <- ini$predictorMatrix # this is your predictor matrix
pred1[,'id'] <- 0 # set all id column values to zero to exclude it as a predictor
m1 <-mice(d1, pred = pred1) # use the new matrix in mice
You can also prevent mice from imputing the variable, but as it contains no missing values this is not necessary (mice will skip it automatically).
Upvotes: 4