fliang
fliang

Reputation: 35

How to divide imputed data with 'survSplit' function and finally pool the analysis results using 'pool' function in MICE package

When I do cox regression using R, I found the proporitonal hazard assumption is violated. So I decided to divide the data into different time intervals using the 'survSplit' function and calculate the HR for each time interval.

For example:

1) divide data into different time intervals

`D1 <- survSplit(Surv(time, status) ~ ., data= data1, cut=c(40, 60),
episode= "tgroup", id="id")`

2) to get HR for each time interval

`coxph(Surv(tstart, time, status) ~ age + class:strata(tgroup), data=D1)`

Now I want to do the same for multiply-imputed datasets (n=5).

step 1 data imputation

imp <- mice(data1, seed = 12345, n=5,print = FALSE)

step 2 reanalysis

fit <- with(imp, coxph(Surv(tstart, time, status) ~ age + class:strata(tgroup))

step 3 pool the results

`summary(pool(fit))`

My question is how I can post process the imputed data to divide each imputed dataset into different time intervals using 'survSplit' between step 1 and step 2. After post-processing, I can continue to do step 2 and step3.

Although I can extract each imputed dataset and then divide it using 'survSplit', I am not able to continue step2 and step 3 using the MICE package. Kindly help. Thanks.

Upvotes: 0

Views: 577

Answers (1)

David L Thiessen
David L Thiessen

Reputation: 694

In this case the best approach is to use {} inside the with() command, which will allow you to place multiple statements inside. Here's a demonstration, based on the example in the survSplit() documentation.

library(survival)
library(mice)
set.seed(123)

# use the veteran dataset with some values made missing
veteran2 <- veteran
veteran2[which(rbinom(nrow(veteran2),1,0.1) == 1),
         "karno"] <- NA
# make imputations
imps <- mice(veteran2, printFlag = FALSE)
# fit the models, with survSplit used inside each call
mods <- with(imps, 
             {
               dat <- data.frame(mget(ls()))
               dat_split <- survSplit(Surv(time, status) ~ ., 
                                      data = dat,
                                      cut = c(60, 120), 
                                      episode = "timegroup")
               coxph(Surv(tstart, time, status) ~ karno*strata(timegroup), 
                     data = dat_split)
             })
pooled <- pool(mods)
summary(pooled)
#>                                 term    estimate   std.error statistic
#> 1                              karno -0.04986785 0.007715564 -6.463279
#> 2 karno:strata(timegroup)timegroup=2  0.03645985 0.015247181  2.391252
#> 3 karno:strata(timegroup)timegroup=3  0.04104428 0.013389159  3.065486
#>          df      p.value
#> 1  70.51041 1.141572e-08
#> 2  94.68223 1.876833e-02
#> 3 108.83144 2.740870e-03

c(t0_60 = pooled$pooled$estimate[1],
  t60_120 = sum(pooled$pooled$estimate[c(1,2)]),
  t_120 = sum(pooled$pooled$estimate[c(1,3)]))
#>        t0_60      t60_120        t_120 
#> -0.049867847 -0.013407999 -0.008823565

Created on 2022-08-24 by the reprex package (v2.0.1)

Upvotes: 0

Related Questions