SEMson
SEMson

Reputation: 1355

How to combine multiply imputed data with mice?

I split a dataset into men and women, and then separately imputed it using the mice package.

#Generate predictormatrix
pred_gender_0<-quickpred(data_gender_0, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)
pred_gender_1<-quickpred(data_gender_1, include=c("age","weight_trunc"),exclude=c("ID","X","gender"),mincor = 0.1)

#impute the data with mice 
imp_pred_gen0 <- mice(data_gender_0,
                 pred=pred_gender_0,
                 m=10,
                 maxit=5,            
                 diagnostics=TRUE,
                 MaxNWts=3000) #i had to set this to 3000 because of an problematic unordered categorical variable 

imp_pred_gen1 <- mice(data_gender_1,
                 pred=pred_gender_1,
                 m=10,
                 maxit=5,            
                 diagnostics=TRUE,
                 MaxNWts=3000)

Now, I have two objects with 10 imputed datasets. One for men, one for women. My question is, how do combine them? Normally, I would just use:

comp_imp<-complete(imp,"long")

Should I:

  1. use rbind.mids() to combine data of men and women and then convert it to long format?
  2. do I first convert to long format and then use rbind.mids() or rbind()?

Thanks for any hints! =)

---------------------------------------------------------------------------

UPDATE - REPRODUCIBLE EXAMPLE

library("dplyr")
library("mice")

# We use nhanes-dataset from the mice-package as example

# first: combine age-category 2 and 3 to get two groups (as example)
nhanes$age[nhanes$age == 3] <- "2"
nhanes$age<-as.numeric(nhanes$age)
nhanes$hyp<-as.factor(nhanes$hyp)

#split data into two groups
nhanes_age_1<-nhanes %>% filter(age==1)
nhanes_age_2<-nhanes %>% filter(age==2)

#generate predictormatrix
pred1<-quickpred(nhanes_age_1, mincor=0.1, inc=c('age','bmi'), exc='chl')
pred2<-quickpred(nhanes_age_2, mincor=0.1, inc=c('age','bmi'), exc='chl')

# seperately impute data
set.seed(121012)
imp_gen1 <- mice(nhanes_age_1,
                 pred=pred1,
                 m=10,
                 maxit=5,            
                 diagnostics=TRUE,
                 MaxNWts=3000)

imp_gen2 <- mice(nhanes_age_2,
                 pred=pred2,
                 m=10,
                 maxit=5,            
                 diagnostics=TRUE,
                 MaxNWts=3000)


#------ ALTERNATIVE 1:

#combine imputed data
combined_imp<-rbind.mids(imp_gen1,imp_gen2)
complete_imp<-complete(combined_imp,"long")

#output
   > combined_imp<-rbind.mids(imp_gen1,imp_gen2)
Warning messages:
1: In rbind.mids(imp_gen1, imp_gen2) :
  Predictormatrix is not equal in x and y; y$predictorMatrix is ignored
.
2: In x$visitSequence == y$visitSequence :
  longer object length is not a multiple of shorter object length
3: In rbind.mids(imp_gen1, imp_gen2) :
  Visitsequence is not equal in x and y; y$visitSequence is ignored
.
   
> complete_imp<-complete(combined_imp,"long")
Error in inherits(x, "mids") : object 'combined_imp' not found


#------ ALTERNATIVE 2:

complete_imp1<-complete(imp_gen1,"long")
complete_imp2<-complete(imp_gen2,"long")
combined_imp<-rbind.mids(complete_imp1,complete_imp2)

#Output
> complete_imp1<-complete(imp_gen1,"long")
> complete_imp2<-complete(imp_gen2,"long")
> combined_imp<-rbind.mids(complete_imp1,complete_imp2)
Error in if (ncol(y) != ncol(x$data)) stop("The two datasets do not have the same number of columns\n") : 
  argument is of length zero

Upvotes: 0

Views: 3226

Answers (3)

Jonathan Bartlett
Jonathan Bartlett

Reputation: 268

You can just use the following to create a new mids object which contains 10 imputed datasets of the men and women.

comp_imp <- rbind(pred_gender_0, pred_gender_1)

Doing this calls rbind.mids, not the regular bind function in R. The new object returned can be analysed in the usual way, e.g. using with.mids to fit your desired model to each of the imputed datasets.

Upvotes: 2

Pablo Casas
Pablo Casas

Reputation: 908

complete_imp1 <- complete(imp_gen1, "long") already returns the 10 (m parameter) imputed data frames, just count the total rows of complete_imp1 and multiply by m

Upvotes: 0

Thomas K
Thomas K

Reputation: 3311

I honestly have no knowledge of the package mice and just a faint idea about the concept of imputation.

I don't know what kind of analysis you would like to perform, but you say that normally you would do: comp_imp<-complete(imp,"long"), so I'll try to answer accordingly.

For me the first approach returns a data.frame, but without any missings. That is weird, since in complete(imp_gen1,"long") there is missing data in hyp. I don't know what rbind.mids is doing there.

I would therefore go with your second approach.

The result from complete(., "long") is a normal data.frame, hence there is no need to bind it with rbind.mids.

I would change your second approach to:

library(dplyr)
complete_imp1 <- complete(imp_gen1, "long")
complete_imp2 <- complete(imp_gen2, "long")
combined_imp <- bind_rows(complete_imp1, complete_imp2)

Upvotes: 0

Related Questions