Reputation: 131
I have been trying to experiment with MICE on data from Kaggle but have been having trouble with imputation of a categorical variable. I was working on this notebook - https://www.kaggle.com/rtatman/animal-bites and was trying to predict the species (SpeciesIDDesc). However, none of the NA values are changed after I run MICE. Below is the code I have right now.
library(tidyverse)
library(lubridate)
library(mice)
#kaggle link with data - https://www.kaggle.com/rtatman/animal-bites
data <- read_csv("Health_AnimalBites.csv",
col_types = list(BreedIDDesc = col_character(),
release_date = col_datetime()))
data_mice_one <- data %>%
filter(!is.na(victim_zip),
!is.na(bite_date),
!is.na(victim_zip),
!is.na(WhereBittenIDDesc)) %>%
mutate(month = month(bite_date, label = TRUE)) %>%
select(SpeciesIDDesc,
victim_zip,
month)
imputed_data_one <- mice(data_mice_one, diagnostics = FALSE, remove_collinear = FALSE, meth="polyreg")
imputed_data_one <- complete(imputed_data_one)
view(imputed_data_one)
sum(is.na(imputed_data_one$SpeciesIDDesc))
I also get a warning message after running 'imputed_data_one <- mice(data_mice_one, diagnostics = FALSE, remove_collinear = FALSE, meth="polyreg")' which says "Warning message:
Number of logged events: 2" Upon investigating the logged events here is what I get -
it im dep meth out
1 0 0 constant SpeciesIDDesc
2 0 0 constant victim_zip
How do I fix my code? Am I using MICE incorrectly?
Upvotes: 1
Views: 1394
Reputation: 131
I just realized I forgot to convert SpeciesIDDesc and month into factors. The code works now
Upvotes: 3