Reputation: 1456
I am trying to use parallel computation to compute percentile bootstrap 95% confidence intervals for least absolute deviations regression parameters, as explained in this article. However, I am not using a single data frame, but rather a multiply imputed data set (mids
) object, obtained with the mice
package for multiple imputation. This is where the problem lies.
I would like to use the mids
(or a list of multiply imputed data sets) object in a foreach loop, perform the bootstrapping, and pool the results. I managed to get results based on just one single data set by converting the mids
object into a list and then use one single element of that list. Nonetheless, I would like to use all data sets at once.
A reproducible example:
library(foreach)
library(doParallel)
cores_2_use <- detectCores() - 1
cl <- makeCluster(cores_2_use)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl)
library(mice)
imp_merged <-
foreach(no = 1:cores_2_use,
.combine = ibind,
.export = "nhanes",
.packages = "mice") %dopar%
{
mice(nhanes, m = 30, printFlag = FALSE)
}
stopCluster(cl)
And here what I have tried:
library(quantreg)
library(mitml)
library(miceadds)
library(splines)
cl <- makeCluster(cores_2_use)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl)
boot.1 <- foreach(i = 1:100,
.combine = rbind,
.packages = c('quantreg', 'mice', 'mitml', 'splines')) %dopar% {
longlist <- miceadds::mids2datlist(imp_merged)
boot_dat <- longlist[[6]][sample(1:nrow(longlist[[6]]), replace = TRUE), ]
## This is now based only on the 6th element of longlist
## I would like to use the whole mids/longlist object (330 data sets on my PC)
fit1 <- rq(chl ~ ns(bmi, df = 2, B = c(21, 33)) +
hyp + age, tau = 0.5,
data = boot_dat)
fit1$coef
}
stopCluster(cl)
boot.1.df <- as.data.frame(boot.1)
boot.1.pooled <- do.call(cbind, boot.1.df)
boot.1.ci <- apply(boot.1.pooled, 2, quantile, probs = c(0.025, 0.975))
t(boot.1.ci)
I converted the mids
object into a list of multiply imputed data sets with longlist <- miceadds::mids2datlist(imp_merged)
and performed the sampling based on one single element (i.e., imputed data set) of that list through boot_dat <- longlist[[6]][sample(1:nrow(longlist[[6]]), replace = TRUE), ]
. I would like to use the whole mids
object or all elements of longlist
.
Any help will be much appreciated!
Upvotes: 3
Views: 986
Reputation: 496
One possible way is to simply combine the datasets into one big data set, and to sample from it directly.
longlist_ = longlist[[1]]
for (j in 2:length(longlist))
{
longlist_ = rbind(longlist_,longlist[[i]])
}
boot_dat <- longlist_[sample(1:nrow(longlist[[6]]), replace = TRUE), ]
Another way is to randomly choose a data set, and random choose a row, and repeat for several times.
boot_dat = NULL
for (j in seq(nrow(longlist[[6]])))
{
boot_dat = rbind(boot_dat,
longlist[[sample(length(longlist),1)]][sample(nrow(longlist[[1]]),1),])
}
Note that to avoid the error of Singular design matrix in rq
, a small noise could be added.
boot_dat[,'hyp'] = boot_dat[,'hyp'] + runif(nrow(boot_dat), -1e-10, 1e-10)
Upvotes: 1