Running Latent Class Growth Analysis on multiple imputed dataset

Question

For the data analysis of the project I'm working on, I have to perform latent class growth analysis (LCGA) to identify different trajectories of my outcome variable over time. Since I have a number of missing variables, I first have to use multiple imputation to construct complete datasets, which I can then use for the LCGA. I managed to figure out most of the problems/issues I ran into while learning how to perform the imputations and the LCGA, but I can't figure out how to run the LCGA on the imputed datasets and then pool the results to get the trajectories based on the pooled results.

Packages used:

library(tidyverse)
library(mice)
library(miceadds)
library(micemd)
library(parallel)
library(lcmm)

The code I have for the imputation is with dataset 'data':

set.seed(2555)
#create a predictormatrix to use in the mice function
pred <- quickpred(data, mincor = 0.5, minpuc = 0.3) 
#parallel mice function to speed up computation, using a CPU with 32 logical cores and maxit=0 to keep computational times low as I test the code
imputed <- parlmice(data, n.core = 31, n.imp.core = 1, pred = pred, maxit=0, cluster.seed = 2555)

The problem I run into at this point is that the LCGA requires data in the long format to work. So at this point I have to convert the mids object created by the parlmice function to a dataframe. Then convert the dataframe to long format for LCGA and run the LCGA like this:

#create dataframe from mids object 'imputed'
lcmm <- complete(imputed, include = FALSE)

#convert dataframe 'lcmm' to long format based on the outcome variable measured at different time points
lcmmlong <- NA
lcmmlong <- lcmm %>% select(ID, x_1:x_14) %>% pivot_longer(
  cols = c('x_1': 'x_14'),
  names_to = "time",
  names_prefix = "x_",
  names_sep = NULL,
  names_pattern = NULL,
  names_ptypes = list(),
  names_transform = list(),
  names_repair = "check_unique",
  values_to = "x",
  values_drop_na = TRUE,
  values_ptypes = list(),
  values_transform = list()
)

#run lcga on the long formate dataframe
lcga1 <-hlme(x ~ time, random=~-1, subject = "ID", ng = 1, data = lcmmlong) 
lcga2 <-gridsearch(rep = 100, maxiter = 10, minit = lcga1,hlme(x ~ time, random=~-1, subject = "ID", ng = 2, data = lcmmlong, mixture = ~ time)) 
lcga3 <-gridsearch(rep = 100, maxiter = 10, minit = lcga1,hlme(x ~ time, random=~-1, subject = "ID", ng = 3, data = lcmmlong, mixture = ~ time))

This ofcourse does not work, because the complete() function throws all imputed datasets in the same dataframe. I can do that in long or broad format, but either way the LCGA can't run on this dataframe. Solution would be to run the LCGA on every individual dataset from the imputation and then pool the results. I've found solutions for this for several different analyses, such as the example from the 'mice' package itself:

imp <- mice(nhanes, seed = 123, print = FALSE)
fit <- with(imp, lm(chl ~ age + bmi + hyp))
est1 <- pool(fit)

However, I have no idea how I can add the transformation from broad format to long format dataframe and the different lcga steps to the mice(), with() and pool() workflow. From what I could find, I would have to manually run the LCGA on every imputed database individually and then manually pool the results.

#Extract imputed databases one by one for databases 1 through n.
lcmmn <- complete(imputed, action = n, include = FALSE)

And even then, running the lcga is one thing, but more actions are required to determine the optimal number of trajectories and so on and so forth. So my question is: How can I run the LCGA on the imputed databases and pool the results, which I can then use to determine the optimal number of trajectories?

Running Latent Class Growth Analysis on multiple imputed dataset

Answers (1)

Small excursion on FIML

Using lavaan on MI data

Related Questions