Reputation: 1
After the multiple imputation (pmm method) using the mice package, there are still missing values in my dataset (although the number of missing values was reduced).
I have checked that there was no issue with constant value or multicollinearity as there was no logged event. I have included most auxiliary variables in the multiple imputation. I removed 3 auxiliary variables earlier due to the presence of logged events. But after such removal, there were no logged events. I have also checked that no variables/columns were completely empty, whereas there were about 7 participants who did not answer any part of the survey (so about 7 rows were completely empty).
There are 14 variables in the main analyses and 10 auxiliary variables. All of them were included in the multiple imputation. All of them contain missing values. All variables in the main analyses are continuous. For auxiliary variables, 6 are categorical and 4 are continuous. The categorical variables were coded as factors in r.
I wonder why there were still missing values? Is this normal?
Can anyone please advise how can I get a complete imputed dataset? If not, can I proceed to multiple mediation analysis with those missing values?
I used this code for the multiple imputation:
alldata4.mi <- mice::mice(alldata4, m = 5, method = 'pmm')
Here's the link to part of my dataset: https://drive.google.com/file/d/1s_KNTSp4NlxvLYKhVWSPfYbBf0EeniXx/view?usp=drive_link
I've also checked out the following discussion, but they don't seem to have the relevant answer for my situation.
https://github.com/amices/mice/discussions/350
https://github.com/amices/mice/discussions/349
https://www.statalist.org/forums/forum/general-stata-discussion/general/1470175-missing-imputed-values-still-present-after-doing%C2%A0multiple-imputation-mice
MICE does not impute certain columns, but also does not give an error
Leftover NAs after imputing using mice
Can anyone please help?
Upvotes: 0
Views: 2557
Reputation: 11
If your dataset contains missing values (NA's) after using mice, and you have checked for (multi)collinearity and the classes of your variables, something else you might try is to adjust the specific variables inside your dataset that affect the generation of imputed values.
Option 1:
One way you could do this is to reduce the number of variables in your dataframe that you are running through mice. If you reduce the number of columns in your dataframe and the number of missing values drops, then perhaps some of the variables/columns you are including are influencing the generation of NA values.
Option 2:
Another way you could do this is to create custom variables for your method and your predictor matrix. This method allows you to retain the complete dataframe and specify which imputation method you want for each column, and which columns influence imputation. See the following code:
# create custom method array to only run pmm on specific columns
columns_to_impute <- c("Column1", "Column2", "Etc")
imputation_methods <- ifelse(names(YOUR_DATA_FRAME) %in% columns_to_impute, "pmm", "")
# create custom predictor matrix for pmm
# Create an empty predictor matrix with all 0s
custom_predictor_matrix <- matrix(0, nrow = ncol(YOUR_DATA_FRAME), ncol = ncol(YOUR_DATA_FRAME))
# change values to 1 for columns/vars that should influence imputation
custom_predictor_matrix[, c(1, 2, 3, 4, 5)] <- 1
# run imputation using custom method array and custom predictor matrix
imp1 <- mice(YOUR_DATA_FRAME, m = 1, method = imputation_methods, custom_predictor_matrix)
# assign results to dataframe
DF <- complete(imp1, action = 1L, include = FALSE)
If you're not sure which columns to select for influencing imputation, theory and prior research are the right place to start for generalized assumptions. And for your specific sample, you can use a correlation matrix to see if there are strong (greater than 0.70) correlations in your data. Hope this helps.
Upvotes: 1
Reputation: 102
Imputing the data you shared yields a warning:
Warning message:
Number of logged events: 6
Then, inspecting the loggedEvents
informs us about the variables that caused problems in the imputation procedure:
it im dep meth out
1 0 0 constant Relation_status
2 0 0 constant Income
3 0 0 constant Religion
4 0 0 constant Ethinic
5 0 0 constant Gender
6 0 0 constant Aboriginal
The problem with these variables is that the columns contain strings and mice
only handles numeric
or factor
data. Converting the variables to factor
s, or excluding them from the imputation procedure solves your problem.
Upvotes: 0
Reputation: 1
Just posting to respond to this question in case it is helpful - I ran into the same issue where I was including variables in my imputed dataset that I did not want to use as predictors nor did I want them imputed (but I wanted to use them for later analyses after I completed imputations).
What went wrong for me was that I needed to specify both in my imputation method and predictor matrix to ignore/not impute these variables, as previously I had only specified in my imputation method to ignore. Then I was able to conduct the imputations and have all missing values imputed.
Code as below:
micelong0 <-mice(data_subset_mice, maxit = 0)
meth_long <-micelong0$method
pred_long <-micelong0$predictorMatrix
meth_long[c("AA","BB","CC","DD")] <- "pmm"
meth_long[names(meth_long) %in% c("A","B","C","D","E","F","G")] <-""
pred_long[, colnames(pred_long) %in% c("A","B","C","D","E","F","G")] <- 0
micelong <-mice(data_subset_mice, meth =meth_long, pred = pred_long, maxit = 50, m=5, seed = 612788)
Upvotes: 0