Alexia S
Alexia S

Reputation: 23

Imputing a categorical variable with MICE but restricting the possible values

I have a categorical variable, var1, that can take on values of "W", "B", "A", "M", "N" or "P". I want to impute the missings, but I know that the missing values cannot be "W" or "B", because those people said that they do not belong in that category. I want to impute var1 but force mice to only choose from everything except "B" or "W".

Here is sample code for you to use:

df <- data.frame(
  age        = c(24, 37, 58, 65, 70, 84, 56, 36, 48, 23, 15), 
  var1       = c("B", "W", NA, "A", NA, "P", "N", NA, "M", NA, "B"), 
  var1categ  = c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0),
  ht         = c(156, 169, 180, 175, 168, 165, 171, 158, 160, 175, 160)
)

imp <- mice(df, remove_collinear = FALSE)

Thank you for your help and please let me know if you need more information.

Upvotes: 1

Views: 1551

Answers (2)

Niek
Niek

Reputation: 1624

I think @stats0007 is correct but you will have to re-insert the deleted rows in all the m imputed datasets (in your case, all 5 imputed datasets). Using your example, this is how I would do it.

First remove all "W" and "B" cases and store in a seperate data.frame

df=data.frame(age=c(24,37,58,65,70,84, 56, 36, 48,23,15), 
              var1 =c("B","W", NA, "A",NA, "P","N", NA, "M",NA, "B"), 
              var1categ=c(0,0, 1, 1, 1,1,1,1,1,1, 0),
              ht = c(156, 169, 180, 175, 168, 165, 171, 158, 160, 175, 160))

df[which(df$var1 != "B" & df$var1 != "W" | is.na(df$var1)),]  # Keep rows not containing B and W
df3 <- df[df$var1 %in% c("B","W"),] # Store deleted rows 

Next, impute the data without these deleted cases. The logged event is because one of your variables is now a constant.

library(mice)
imp=mice(df2, remove_collinear = FALSE) 

Finally, insert the deleted cases back into each imputed datasets 1:5. There is probably a better way, but a for loop can work.

# Create an empty data frame
data <- data.frame()

# For each imputation 1:5
for(i in unique(comp_imp$.imp)){

  # Create a .imp variable and .id variable in the dataset with the deleted rows
  df3$.imp <- i
  df3$.id <- (max(comp_imp$.id)+1):(max(comp_imp$.id)+nrow(df3))
  df3 <- df3[,c(5,6,1:4)]

  # Bind the new rows to the imputed dataset
  df_temp <- rbind(comp_imp[comp_imp$.imp == i,],df3)
  data <- rbind(data, df_temp)
}

data now contains all the imputed values and the original "B" and "W" observed values. You could transform this back to a mids object for further use in the mice package.

# Transform into a mids object for further use
imp_tot <- as.mids(data)

Upvotes: 1

Steffen Moritz
Steffen Moritz

Reputation: 7730

I think the following approach should work:

  1. Completely remove all "W" and "B" cases from your dataset.
  2. Perform the imputation with mice

Since you only have missing data in var1 (where you are sure that no W and B is present), you don't need the "W" and "B" cases anyway.

Note: Approach would be different, if you also have missing data in other columns.

Upvotes: 1

Related Questions