Sheikh
Sheikh

Reputation: 57

How to generate survival data with time dependent covariates using R

i want to generate survival time from a Cox proportional hazards model that contains time dependent covariate. The model is

h(t|Xi) =h_0(t) exp(gamma*Xi + alpha*mi(t))

where Xi is generated from Binomial(1,0.5) and mi(t) is a time-dependent covariate.

For time-independent covariate i generated as follows

#For time independent case
# h_0(t) = 1
gamma <- -1
u <- runif(n=100,min=0,max=1)
Xi <- rbinom(n=100,size=1,prob=0.5)
T <- -log(u)/exp(gamma*Xi)

Can anyone please help me to generate survival data with time-varying covariate.

Upvotes: 4

Views: 2834

Answers (1)

AdamO
AdamO

Reputation: 4960

Time dependent covariates are entered into a Cox model by censoring the observation at the time that the covariate changes and re-entering them into the cohort either at time 0 or at the time of censor. Therefore a covariate matrix is generated with intervals and merged many-to-one for non-events and many-to-two for events based on pre/post periods of the event observation. You can drop covariate modifications following events. Concurrent changes in covariate values and failures are not handled by the Cox model so we must exclude this possibility.

For simulating the survival outcomes, you generate covariate values and switch points. Then you simulate survival outcomes according to the baseline covariate value. If the first covariate change time exceeds the failure time, keep the failure time and that participant has no covariate changes, otherwise censor the observation at that failure time and re-enter them into the cohort at the censoring time with the new covariate value. Simulate the second covariate value change (if it exists) and the second possible failure time, evaluate them iteratively and end only once the failure time precedes a covariate change value.

Laying this out, someone may provide more efficient code than I, but a simple way to do it is with recursion. I will assume for the time being that there is a constant baseline hazard function (exponential survival), but the principals of simulating survival outcomes according to an arbitrary baseline hazard function have been described elsewhere and I leave that to you. I'll also assume m is binary for simplicity. Again, this is foundation for you to wrap up.

sim <- function(id=1:100, t0= rep(0, 100), m0=rep(0, 100), bfail=c(0,0), rchange=1) {
  tfail <- rexp(length(id), exp(bfail[1] + bfail[2]*m0))
  tchange <- rexp(length(id), rchange)
  tevent <- pmin(tfail, tchange)
  fevent <- tfail == tevent
  if (all(fevent))
    return(list(cbind('start'=t0, 'stop'=t0+tevent, 'event'=fevent, 'id'=id, 'm'=m0)))
  c(
    list(cbind('start'=t0, 'stop'=t0+tevent, 'event'=fevent, 'id'=id, 'm'=m0)), 
    sim(id = id[!fevent], t0=(t0+tevent)[!fevent], m0=1-m0[!fevent], bfail, rchange)
  )

}

Can run an example with:

set.seed(123)
times <- sim(id=1:1000, t0= rep(0, 1000), m0=rep(0, 1000), bfail=c(-1, -2), rchange=0.4)
times <- as.data.frame(do.call(rbind, times))
coxph(Surv(start, stop, event) ~ m, data=times)

Giving the following output:

> coxph(Surv(start, stop, event) ~ m, data=times)
Call:
coxph(formula = Surv(start, stop, event) ~ m, data = times)

    coef exp(coef) se(coef)     z      p
m -1.917     0.147    0.100 -19.1 <2e-16

Likelihood ratio test=533  on 1 df, p=0
n= 2852, number of events= 1000 

Compare -1.917 coefficient to -2 the log hazard ratio for the exponential survival outcomes.

Upvotes: 1

Related Questions