Reputation: 23
I have a categorical variable, var1
, that can take on values of "W"
, "B"
, "A"
, "M"
, "N"
or "P"
. I want to impute the missings, but I know that the missing values cannot be "W"
or "B"
, because those people said that they do not belong in that category. I want to impute var1
but force mice to only choose from everything except "B"
or "W"
.
Here is sample code for you to use:
df <- data.frame(
age = c(24, 37, 58, 65, 70, 84, 56, 36, 48, 23, 15),
var1 = c("B", "W", NA, "A", NA, "P", "N", NA, "M", NA, "B"),
var1categ = c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0),
ht = c(156, 169, 180, 175, 168, 165, 171, 158, 160, 175, 160)
)
imp <- mice(df, remove_collinear = FALSE)
Thank you for your help and please let me know if you need more information.
Upvotes: 1
Views: 1551
Reputation: 1624
I think @stats0007 is correct but you will have to re-insert the deleted rows in all the m imputed datasets (in your case, all 5 imputed datasets). Using your example, this is how I would do it.
First remove all "W" and "B" cases and store in a seperate data.frame
df=data.frame(age=c(24,37,58,65,70,84, 56, 36, 48,23,15),
var1 =c("B","W", NA, "A",NA, "P","N", NA, "M",NA, "B"),
var1categ=c(0,0, 1, 1, 1,1,1,1,1,1, 0),
ht = c(156, 169, 180, 175, 168, 165, 171, 158, 160, 175, 160))
df[which(df$var1 != "B" & df$var1 != "W" | is.na(df$var1)),] # Keep rows not containing B and W
df3 <- df[df$var1 %in% c("B","W"),] # Store deleted rows
Next, impute the data without these deleted cases. The logged event is because one of your variables is now a constant.
library(mice)
imp=mice(df2, remove_collinear = FALSE)
Finally, insert the deleted cases back into each imputed datasets 1:5. There is probably a better way, but a for loop can work.
# Create an empty data frame
data <- data.frame()
# For each imputation 1:5
for(i in unique(comp_imp$.imp)){
# Create a .imp variable and .id variable in the dataset with the deleted rows
df3$.imp <- i
df3$.id <- (max(comp_imp$.id)+1):(max(comp_imp$.id)+nrow(df3))
df3 <- df3[,c(5,6,1:4)]
# Bind the new rows to the imputed dataset
df_temp <- rbind(comp_imp[comp_imp$.imp == i,],df3)
data <- rbind(data, df_temp)
}
data
now contains all the imputed values and the original "B" and "W" observed values. You could transform this back to a mids
object for further use in the mice
package.
# Transform into a mids object for further use
imp_tot <- as.mids(data)
Upvotes: 1
Reputation: 7730
I think the following approach should work:
Since you only have missing data in var1 (where you are sure that no W and B is present), you don't need the "W" and "B" cases anyway.
Note: Approach would be different, if you also have missing data in other columns.
Upvotes: 1