Reputation: 1355
I split a dataset into men and women, and then separately imputed it using the mice
package.
#Generate predictormatrix
pred_gender_0<-quickpred(data_gender_0, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)
pred_gender_1<-quickpred(data_gender_1, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)
#impute the data with mice
imp_pred_gen0 <- mice(data_gender_0,
pred=pred_gender_0,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000) #i had to set this to 3000 because of an problematic unordered categorical variable
imp_pred_gen1 <- mice(data_gender_1,
pred=pred_gender_1,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
Now, I have two objects with 10 imputed datasets. One for men, one for women. My question is, how do combine them? Normally, I would just use:
comp_imp<-complete(imp,"long")
Should I:
rbind.mids()
to combine data of men and women and then convert it to long format?rbind.mids()
or rbind()
?Thanks for any hints! =)
library("dplyr")
library("mice")
# We use nhanes-dataset from the mice-package as example
# first: combine age-category 2 and 3 to get two groups (as example)
nhanes$age[nhanes$age == 3] <- "2"
nhanes$age<-as.numeric(nhanes$age)
nhanes$hyp<-as.factor(nhanes$hyp)
#split data into two groups
nhanes_age_1<-nhanes %>% filter(age==1)
nhanes_age_2<-nhanes %>% filter(age==2)
#generate predictormatrix
pred1<-quickpred(nhanes_age_1, mincor=0.1, inc=c('age','bmi'), exc='chl')
pred2<-quickpred(nhanes_age_2, mincor=0.1, inc=c('age','bmi'), exc='chl')
# seperately impute data
set.seed(121012)
imp_gen1 <- mice(nhanes_age_1,
pred=pred1,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
imp_gen2 <- mice(nhanes_age_2,
pred=pred2,
m=10,
maxit=5,
diagnostics=TRUE,
MaxNWts=3000)
#------ ALTERNATIVE 1:
#combine imputed data
combined_imp<-rbind.mids(imp_gen1,imp_gen2)
complete_imp<-complete(combined_imp,"long")
#output
> combined_imp<-rbind.mids(imp_gen1,imp_gen2)
Warning messages:
1: In rbind.mids(imp_gen1, imp_gen2) :
Predictormatrix is not equal in x and y; y$predictorMatrix is ignored
.
2: In x$visitSequence == y$visitSequence :
longer object length is not a multiple of shorter object length
3: In rbind.mids(imp_gen1, imp_gen2) :
Visitsequence is not equal in x and y; y$visitSequence is ignored
.
> complete_imp<-complete(combined_imp,"long")
Error in inherits(x, "mids") : object 'combined_imp' not found
#------ ALTERNATIVE 2:
complete_imp1<-complete(imp_gen1,"long")
complete_imp2<-complete(imp_gen2,"long")
combined_imp<-rbind.mids(complete_imp1,complete_imp2)
#Output
> complete_imp1<-complete(imp_gen1,"long")
> complete_imp2<-complete(imp_gen2,"long")
> combined_imp<-rbind.mids(complete_imp1,complete_imp2)
Error in if (ncol(y) != ncol(x$data)) stop("The two datasets do not have the same number of columns\n") :
argument is of length zero
Upvotes: 0
Views: 3226
Reputation: 268
You can just use the following to create a new mids object which contains 10 imputed datasets of the men and women.
comp_imp <- rbind(pred_gender_0, pred_gender_1)
Doing this calls rbind.mids, not the regular bind function in R. The new object returned can be analysed in the usual way, e.g. using with.mids to fit your desired model to each of the imputed datasets.
Upvotes: 2
Reputation: 908
complete_imp1 <- complete(imp_gen1, "long")
already returns the 10 (m
parameter) imputed data frames, just count the total rows of complete_imp1
and multiply by m
Upvotes: 0
Reputation: 3311
I honestly have no knowledge of the package mice
and just a faint idea about the concept of imputation.
I don't know what kind of analysis you would like to perform, but you say that normally you would do: comp_imp<-complete(imp,"long")
, so I'll try to answer accordingly.
For me the first approach returns a data.frame, but without any missings. That is weird, since in complete(imp_gen1,"long")
there is missing data in hyp
. I don't know what rbind.mids
is doing there.
I would therefore go with your second approach.
The result from complete(., "long")
is a normal data.frame, hence there is no need to bind it with rbind.mids
.
I would change your second approach to:
library(dplyr)
complete_imp1 <- complete(imp_gen1, "long")
complete_imp2 <- complete(imp_gen2, "long")
combined_imp <- bind_rows(complete_imp1, complete_imp2)
Upvotes: 0