Reputation: 31
I'm about to do imputation for missing values and I use the mice-package. I need to do imputation based on specific column content. So basically, I have 24 columns that are used to measure 4 Latent Variables (using the plspm
-package). I wish to impute N/A's based on specific column content. So for cols 1-6 I wish to impute NAs in those specific columns based only on the content within these 6. (and so forth for cols 7-12, 13-18 and 19-24).
I hope it makes sense for you guys.
My data structure is:
p1 p2 p3 p4 p5 p6 l1 l2 l3 l4 l5 l6
4 3 5 4 5 N/A 2 1 4 5 1 N/A
4 4 1 3 1 2 1 1 1 1 1 1
5 4 5 4 4 4 4 4 5 5 4 4
5 4 5 5 4 5 4 4 N/A 5 4 4
5 5 5 5 5 5 3 2 5 5 2 2
4 3 4 3 3 3 3 2 3 4 3 2
5 4 5 5 3 4 4 1 5 5 5 4
5 5 5 5 5 5 5 3 4 5 3 4
4 4 4 4 3 N/A 4 4 5 4 3 3
5 4 4 4 3 2 1 3 2 5 1 1
4 4 4 4 5 5 3 4 5 5 3 3
4 3 2 N/A 1 2 N/A 1 2 N/A 1 N/A
3 3 4 4 3 2 1 3 3 3 1 3
5 3 4 4 4 2 3 4 4 4 3 3
4 4 4 5 2 2 2 2 2 2 3 3
5 4 4 4 4 4 4 4 5 5 4 3
4 3 3 3 5 2 2 2 4 4 1 1
5 4 5 4 5 3 1 1 5 5 2 3
4 3 1 3 4 4 2 1 4 3 2 3
4 3 1 4 3 1 2 1 4 4 3 2
3 3 5 4 5 1 2 2 4 5 3 2
4 4 5 3 5 5 2 2 3 4 2 3
4 4 2 3 2 3 2 2 3 4 2 2
5 5 5 5 5 5 4 3 3 3 3 3
5 5 5 5 5 4 4 N/A 5 5 N/A N/A
So I guess it's essentially splitting data into 4 blocks and then imputing. I read about the blocks()-function in the help(mice), but I'm not sure I can actually use that for this specific task.
The code i've been using so far is:
temp_pmm <- mice(data_predict,
m = 3,
maxit = 10,
method = "pmm",
seed = 2374)
But the way I understand the package, it imputes based on entire row content (so my latent variable constructs overlap, which I am trying to mitigate).
Hope you can help me out and I appreciate any help. Thanks in advance!
Tobias
Upvotes: 1
Views: 1158
Reputation: 31
So Dominix' suggestion of simply running separate imputations seems to be the right way to go. Thanks a lot!
For any future reference, this is how I worked it out:
test_pmm_firstv <- mice(data_predict[,c(1:6)],
m = 10,
maxit = 20,
method = "pmm",
seed = 127493)
test_pmm_secondv <- mice(data_predict[,c(7:12)],
m = 10,
maxit = 20,
method = "pmm",
seed = 1239754111)
test_pmm_thirdv <- mice(data_predict[,c(13:18)],
m = 10,
maxit = 20,
method = "pmm",
seed = 1238603)
test_pmm_fourthv <- mice(data_predict[,c(19:24)],
m = 10,
maxit = 20,
method = "pmm",
seed = 356811)
data_pmm_firstv <- mice::complete(test_pmm_firstv, 1)
data_pmm_secondv <- mice::complete(test_pmm_secondv, 1)
data_pmm_thirdv <- mice::complete(test_pmm_thirdv, 1)
data_pmm_fourthv <- mice::complete(test_pmm_fourthv, 1)
data_fixed <- as.data.frame(cbind(data_pmm_firstv, data_pmm_secondv, data_pmm_thirdv, data_pmm_fourthv))
anyNA(data_fixed)
[1] FALSE
Upvotes: 1