Daniel G
Daniel G

Reputation: 51

How do I calculate cronbach's alpha on multiply imputed data?

I have run a multiple imputation (m=45, 10 iterations) using the MICE package, and want to calculate the cronbach's alpha for a number of ordinal scales in the data. Is there a function in r that could assist me in calculating the alpha coefficient across the imputed datasets in a manner that would satisfy Rubin's rules for pooling estimates?

Upvotes: 2

Views: 803

Answers (1)

Dion Groothof
Dion Groothof

Reputation: 1456

We may exploit pool.scalar from the mice package, which performs pooling of univariate estimates according to Rubin's rules.

Since you have not provided a reproducible example yourself, I will provide one.

set.seed(123)

# sample survey responses
df <- data.frame(
  x1 = c(1,2,2,3,2,2,3,3,2,3,
         1,2,2,3,2,2,3,3,2,3,
         1,2,2,3,2,2,3,3,2,3),
  x2 = c(1,1,1,2,3,3,2,3,3,3,
         1,1,1,2,3,3,2,3,3,3,
         1,2,2,3,2,2,3,3,2,3),
  x3 = c(1,1,2,1,2,3,3,3,2,3,
         1,1,2,1,2,3,3,3,2,3,
         1,2,2,3,2,2,3,3,2,3)
)

# function to column-wise generate missing values (MCAR)
create_missings <- function(data, prob) {
  x <- replicate(ncol(data),rbinom(nrow(data), 1, prob))
  for(k in 1:ncol(data)) {
    data[, k] <- ifelse(x[, k] == 1, NA, data[,k])
  }
  data
}
df <- create_missings(df, prob = 0.2)

# multiple imputation ----------------------------------

library(mice)
imp <- mice(df, m = 10, maxit = 20)

# extract the completed data in long format
implong <- complete(imp, 'long')

We need a function to compute cronbach's alpha and obtain an estimate of the standard error of alpha, which can be used in a call to pool.scalar() later on. Since there is no available formula with which we can analytically estimate the standard error of alpha, we also need to deploy a bootstrapping procedure to estimate this standard error.

The function cronbach_fun() takes the following arguments:

  • list_compl_data: a character string specifying the list of completed data from a mids object.
  • boot: a logical indicating whether a non-parametrical bootstrap should be conducted.
  • B: an integer specifying the number of bootstrap samples to be taken.
  • ci: a logical indicating whether a confidence interval around alpha should be estimated.
cronbach_fun <- function(list_compl_data, boot = TRUE, B = 1e4, ci = FALSE) {
  n <- nrow(list_compl_data); p <- ncol(list_compl_data)
  total_variance <- var(rowSums(list_compl_data))
  item_variance <- sum(apply(list_compl_data, 2, sd)^2)
  alpha <- (p/(p - 1)) * (1 - (item_variance/total_variance))
  out <- list(alpha = alpha)
  boot_alpha <- numeric(B)
  if (boot) {
    for (i in seq_len(B)) {
      boot_dat <- list_compl_data[sample(seq_len(n), replace = TRUE), ]
      total_variance <- var(rowSums(boot_dat))
      item_variance <- sum(apply(boot_dat, 2, sd)^2)
      boot_alpha[i] <- (p/(p - 1)) * (1 - (item_variance/total_variance))
    }
    out$var <- var(boot_alpha)
  }
  if (ci){
    out$ci <- quantile(boot_alpha, c(.025,.975))
  }
  return(out)
}

Now that we have our function to do the 'heavy lifting', we can run it on all m completed data sets, after which we can obtain Q and U (which are required for the pooling of the estimates). Consult ?pool.scalar for more information.

m <- length(unique(implong$.imp))
boot_alpha <- rep(list(NA), m)
for (i in seq_len(m)) {
  set.seed(i) # fix random number generator
  sub <- implong[implong$.imp == i, -c(1,2)]
  boot_alpha[[i]] <- cronbach_fun(sub)
}

# obtain Q and U (see ?pool.scalar)
Q <- sapply(boot_alpha, function(x) x$alpha)
U <- sapply(boot_alpha, function(x) x$var)

# pooled estimates
pool_estimates <- function(x) {
  out <- c(
    alpha = x$qbar,
    lwr = x$qbar - qt(0.975, x$df) * sqrt(x$t),
    upr = x$qbar + qt(0.975, x$df) * sqrt(x$t)
  )
  return(out)
}

Output

# Pooled estimate of alpha (95% CI)
> pool_estimates(pool.scalar(Q, U))
    alpha       lwr       upr 
0.7809977 0.5776041 0.9843913

Upvotes: 4

Related Questions