tomw
tomw

Reputation: 3158

Include ID variable in imputed data frame

I'm using library(mice) to impute missing data. I want a way to tell mice that the ID variables should be included on the imputed data set but not used for the imputations.

For instance

#making a silly data frame with missing data
library(tidyverse)
library(magrittr)
library(mice)

d1 <- data.frame(
  id = str_c(
    letters[1:20] %>% 
      rep(each = 5),
    1:5 %>% 
      rep(times  = 20)
    ),
  v1 = runif(100),
  v2 = runif(100),
  v3 = runif(100)
  )

d1[, -1] %<>%
  map(
    function(i){

      i[extract(sample(1:100, 5, F))] <- NA

      i
      }
    )

This is the returned mids object

m1 <- d1 %>% 
  select(-id) %>% 
  mice

How can I include d1$id as a variable in in each of the imputed data frames?

Upvotes: 3

Views: 2689

Answers (2)

ila
ila

Reputation: 766

Niek's answer is the correct way to go, but I have also noticed that character variables are automatically caught and removed from mice's predictor matrix on the basis of being "constant" (maybe because mice sees character vectors as being entirely NA). As a result, it seems to me that any character-type variables you include in your dataset will be passed through the imputation step without being used as predictors for imputation.

Upvotes: -1

Niek
Niek

Reputation: 1624

There are two ways. First, simply append id to the imputed datasets

d2 <- complete(m1,'long', include = T) # imputed datasets in long format (including the original)
d3 <- cbind(d1$id,d2) # as datasets are ordered simply cbind `id`
m2 <- as.mids(d3) # and transform back to mids object

This ensures that id has no role in the imputation process, but is a bit sloppy and prone to error. Another way is to simply remove it from the predictor matrix.

The 2011 manual by Van Buuren & Groothuis-Oudshoorn says: "The user can specify a custom predictorMatrix, thereby effectively regulating the number of predictors per variable. For example, suppose that bmi is considered irrelevant as a predictor. Setting all entries within the bmi column to zero effectively removes it from the predictor set ... will not use bmi as a predictor, but still impute it."

To do this

ini <- mice(d1,maxit=0) # dry run without iterations to get the predictor matrix

pred1 <- ini$predictorMatrix # this is your predictor matrix
pred1[,'id'] <- 0 # set all id column values to zero to exclude it as a predictor

m1 <-mice(d1, pred = pred1) # use the new matrix in mice

You can also prevent mice from imputing the variable, but as it contains no missing values this is not necessary (mice will skip it automatically).

Upvotes: 4

Related Questions