Reputation: 1
I am working with multiply imputed complex survey data and trying to estimate CIs for a proportion using the Thomas Lumley's survey and mitools package, in particular svyciprop() function with beta method in R. The beta method is known to be preferable for complex survey data, and I'm using MIcombine() from the mitools package to combine estimates across imputations. However, MIcombine() does not provide CIs and it generates only SE.
Here's a simplified example using the dummy dataset:
# Dummy data: using mtcars dataset
data("mtcars")
# Creating multiple imputations
imputed_data <- list(mtcars, mtcars, mtcars)
# Creating a survey design object with imputation
prop_estimates <- lapply(imputed_data, function(data) {
design <- svydesign(id = ~cyl, weights = ~wt, data = data, nest = TRUE)
svyciprop(~I(am == 1), design, method = "beta")})
mitools::MIcombine(prop_estimates)
I have tried different approaches as using confint() after MIcombine (). But it also did not work out. I have tried summary() function after after MIcombine(). But it gave the exact same CIs no matter we choose different methods for CIs (such as "beta" or "likelihood").
Upvotes: 0
Views: 37
Reputation: 21
One possible solution may be to pool the survey statistics using the "pool_prop_wilson" function in the "miceafter" package. This function pools results from lists including the means, wald standard errors and the complete degrees of freedom [from the unweighted sample size - 1] for each imputed dataset using the method published by Lott & Reiter (2020). However, the mean and standard error values could be replaced with estimates from "svymean" prior to pooling them. I have included a reproducible example below using the mtcars dataset.
# Reproducible dummy data using the mtcars dataset
data("mtcars")
mtcars$am[1:5] <- NA # added some missing values
library(mice) # for multiple imputation of missing values
imputed_mids <- mice(mtcars, m = 5, method = "pmm", seed = 2024)
imputed_list <- lapply( seq(imputed_mids$m), function(im) complete(imputed_mids, im) )
library(survey); library(mitools) # for a survey design object with multiple imputation
design_imputed <- svydesign(id = ~cyl, weights = ~wt, data = imputationList(imputed_list), nest = TRUE)
# Calculate a pooled confidence intervals for the proportion (without considering survey design)
library(miceafter) # for binomial proportion confidence intervals with multiple imputation
imputed_dataframe <- dplyr::bind_rows(imputed_list, .id = "Impnr")
imputed_milist <- df2milist(imputed_dataframe, impvar = "Impnr")
imputed_stats <- with(imputed_milist, expr = prop_wald(am ~ 1))
pool_prop_wilson(imputed_stats, conf.level = 0.95)
# Replace the imputation statistics with design-based estimates from the "survey" package
svymean_stats <- with(design_imputed, svymean(~I(am == 1)) )
svy_imputed_stats <- imputed_stats
for(i in 1:length(svymean_stats)){ # replace values for means
svy_imputed_stats[["statistics"]][[i]][[1]] <- mean(svymean_stats[[i]][[2]]) }
for(i in 1:length(svymean_stats)){ # replace values for standard errors
svy_imputed_stats[["statistics"]][[i]][[2]] <- SE(svymean_stats[[i]])[[2]] }
# Recalculate the pooled Wilson CI including the design-based estimates
pool_prop_wilson(svy_imputed_stats, conf.level = 0.95)
# Alternative approaches for pooling/CIs
pool_prop_nna(svy_imputed_stats, conf.level = 0.95) #miceafter using approximate Beta distribution
pool_prop_wald(svy_imputed_stats, conf.level = 0.95) #miceafter using Wald method (Rubin's Rules)
summary( MIcombine(svymean_stats) ) #mitools pooled estimate for svymean
Upvotes: 0
Reputation: 6114
as far as i know, the author of the survey
and mitools
hasn't implemented a svyciprop
method for multiply-imputed data? in general, i comment out svyciprop
, svyttest
, and svychisq
functions on multiply-imputed microdata (nhis for example). there's a hack version of it called MIsvyciprop
in the (abandoned) lodown package so maybe not publishable or defensible but perhaps helpful to see how it might work
Upvotes: 0