Clar_k
Clar_k

Reputation: 51

Using mice inputed data sets in GLM analysis; can pooled model fit indices be obtained?

I used mice to impute five missing data sets, saved as the object "allImputations" in the code below. I then needed to complete linear and dichotomous regression analyses across the imputed data sets (see below for a successful example):

SIStep2a<-with(allImputations, lm(Y1~X1+X2+X3)) 
SIStep2a<-as.mira(SIStep2a)
summary(pool(SIStep2a))
pool.r.squared(SIStep2a, adjusted = FALSE)

The above code above provides all the information I need for linear models, but I run into problems when I use glm to perform a logit regression in the same dataset(s).

treat.Step1a<-with(allImputations, glm(Y2~X1+X2+X3, family=binomial))
treat.Step1a<-as.mira(treat.Step1a)
summary(pool(treat.Step1a))

In this instance, I need a pooled pseudo R2 or other pooled model fit index (similar to the pool.r.squared function). However, I cannot find a way to produce either pooled model fit indices OR the fit indices for each analysis of the five imputed data sets.

Essentially, is there a pool.r.squared analog for glm analyses across multiply imputed datasets from mice? Or is there a longhand way to calculate this via the info in the saved object "treat.Step1a" above? Or is there a way to isolate fit indices for each of the five analyses completed for each imputed data set?

Update

I was able to download a package directly from GitHub (glmice), which was no longer available via CRAN. However, the command mcf() would not successfully execute in my current R Studio version.

I ultimately ran each step of the model (i.e. as I added each block of variables) across all five imputed data sets and averaged the McFadden's R2 of all five imputed datasets to very roughly assess the improvement in Pseudo R2.

Is this an acceptable middle ground approach?

Upvotes: 5

Views: 1081

Answers (1)

hanne
hanne

Reputation: 102

The glm() function does not yield $R^2$ values in the summary(). Therefore, the pool.r.squared() function cannot be used. However, the transformation implemented in that function (proposed in Harel, 2009) may be manually performed on pseudo $R^2$ values. In the example below, the pseudo $R^2$ cf. this response is used.

# setup
library(mice, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
pseudo_r2 <- function(deviance, null.deviance){1 - deviance/null.deviance}

# data
dat <- nhanes
dat$hyp <- dat$hyp - 1

# impute and fit
imp <- mice(dat, print = FALSE)
fit <- with(imp, glm(hyp ~ age + bmi, family = "binomial"))

# extract transformed r2
est <- purrr::map_dfr(fit$analyses, ~{
  broom::glance(.x) %>% 
    mutate(r2 = pseudo_r2(.$deviance, .$null.deviance),
           r = sqrt(r2),
           z =  0.5 * log((r + 1) / (1 - r)),
           df = 1 / (nobs - 3))
})

# pool transformed values
pool <- pool.scalar(est$z, est$df)

### copied from mice::pool.r.suared()
# Make table with results
qbar <- pool$qbar
table <- array(((exp(2 * qbar) - 1) / (1 + exp(2 * qbar)))^2,
               dim = c(1, 4)
)
dimnames(table) <- list("R^2", c("est", "lo 95", "hi 95", "fmi"))
table[, 2] <- ((exp(2 * (qbar - 1.96 * sqrt(fit$t))) - 1) / (1 + exp(2 * (qbar - 1.96 * sqrt(fit$t)))))^2
#> Error in sqrt(fit$t): non-numeric argument to mathematical function
table[, 3] <- ((exp(2 * (qbar + 1.96 * sqrt(fit$t))) - 1) / (1 + exp(2 * (qbar + 1.96 * sqrt(fit$t)))))^2
#> Error in sqrt(fit$t): non-numeric argument to mathematical function
table[, 4] <- fit$f
#> Error in table[, 4] <- fit$f: number of items to replace is not a multiple of replacement length
table
#>           est     lo 95     hi 95       fmi
#> R^2 0.3599957 0.3599957 0.3599957 0.3599957
###

Created on 2023-10-04 with reprex v2.0.2

Upvotes: 1

Related Questions