Reputation: 61
In R, what is the best way to incorporate the interaction term between a covariate and time, when the proportionality test (with coxph) shows that the proportionality assumption in the Cox model is violated? I know that you can either use strata or an interaction with time term, I'm interested in the latter. I haven't been able to find a definitive clear explanation with examples on how to do this on the internet. In the most common example using the Rossi dataset, Fox suggested to do,
coxph(formula = Surv(start, stop, arrest.time) ~ fin + age + age:stop + prio, data = Rossi.2)
Is there a difference between modeling with age:stop versus age:start? Does the formula have to use this format? If I use the Surv with the two parameter format, would the following also make sense?
coxph(formula = Surv(week, arrest) ~ fin + age + age:week + prio, data = Rossi)
Or you have to split the dataset and use the Surv(start,stop,event) method? Also, there is the time-transform method, so,
coxph(formula = Surv(week, arrest) ~ fin + age + tt(age) + prio, data = Rossi, tt=function(x,t,...) x*t)
I know that some people would prefer model with the log(t)
instead of t
here. But which one of these is the correct method to model interaction with time? Do these all refer to the same/different underlying statistical model? And the end, are all modeling (for the interaction term): h(t) = h0(t)exp(b*X*t)
?
Upvotes: 6
Views: 5265
Reputation: 7568
This is essentially a 3 part question:
survival::coxph
functionI will try to answer these questions in the following using the veteran data example, which is featured in section 4.2 of the vignette on time-dependent covariates and time-dependent coefficients (also known as time-varying effects) in the survival
package:
library(dplyr)
library(survival)
data("veteran", package = "survival")
veteran <- veteran %>%
mutate(
trt = 1L * (trt == 2),
prior = 1L * (prior == 10))
head(veteran)
#> trt celltype time status karno diagtime age prior
#> 1 0 squamous 72 1 60 7 69 0
#> 2 0 squamous 411 1 70 5 64 1
#> 3 0 squamous 228 1 60 3 38 0
#> 4 0 squamous 126 1 60 9 63 1
#> 5 0 squamous 118 1 70 11 65 1
#> 6 0 squamous 10 1 20 5 49 0
There are different popular methods and implementations, e.g. survival::coxph
, timereg::aalen
or using GAMs after appropriate data transformation (see below).
Although the specific methods and their implementaitons differ, a general idea ist to create a long form data set where
Then, the time (or a transformation of time, e.g. log(t)) is simply a covariate and time-varying effects can be estimated by an interaction between the covariate of interest and the (transformed) covariate of time.
If the functional form of the time-variation is known, you can use the tt()
aproach:
cph_tt <- coxph(
formula = Surv(time, status) ~ trt + prior + karno + tt(karno),
data = veteran,
tt = function(x, t, ...) x * log(t + 20))
survival::coxph
functionThere is no difference. I assume the tt()
function is simply a short-cut for the estimation via transformation to the long-format. You can verify that the two approaches are equivalent using the code below:
veteran_long <- survSplit(Surv(time, status)~., data = veteran, id = "id",
cut = unique(veteran$time)) %>%
mutate(log_time = log(time + 20))
head(veteran_long) %>% select(id, trt, age, tstart, time, log_time, status)
#> id trt age tstart time log_time status
#> 1 1 0 69 0 1 3.044522 0
#> 2 1 0 69 1 2 3.091042 0
#> 3 1 0 69 2 3 3.135494 0
#> 4 1 0 69 3 4 3.178054 0
#> 5 1 0 69 4 7 3.295837 0
#> 6 1 0 69 7 8 3.332205 0
cph_long <- coxph(formula = Surv(tstart, time, status)~
trt + prior + karno + karno:log_time, data = veteran_long)
## models are equivalent, just different specification
cbind(coef(cph_long), coef(cph_tt))
#> [,1] [,2]
#> trt 0.01647766 0.01647766
#> prior -0.09317362 -0.09317362
#> karno -0.12466229 -0.12466229
#> karno:log_time 0.02130957 0.02130957
As mentioned before, time-varying effects are simply interactions of a covariate x
and time t
, thus time-varying effects can have different specifications, equivalent to interactions in standard regression models, e.g.
x*t
: linear covariate effect, linearly time-varying effect f(x)*t
: non-linear covariate effect, linearly time-varying effectf(t)*x
: linear covariate effect, non-linearly time-varying (for categorical x) this essentially represents a stratified baseline hazard f(x, t)
: non-linear, non-linearly time-varying effectIn each case, the functional form of the effect f
can either be estimated from the data or prespecified (e.g. f(t)*x = karno * log(t + 20)
above).
In most cases you would prefer to estimate f
from the data. The support for the (penalized) estimation of such effects is to my knowledge limited in the survival
package. However, you can use mgcv::gam
to estimate any of the effects specified above (after appropriate data transformation). An example is given below and shows that the effect of karno
goes towards 0 as time progresses, regardless of the Karnofsky score at the beginning of the follow-up (see here for details and also Section 4.2 here):
library(pammtools)
# data transformation
ped <- as_ped(veteran, Surv(time, status)~., max_time = 400)
# model
pam <- mgcv::gam(ped_status ~ s(tend) + trt + prior + te(tend, karno, k = 10),
data = ped, family = poisson(), offset = offset, method = "REML")
p_2d <- gg_tensor(pam)
p_slice <- gg_slice(ped, pam, "karno", tend = unique(tend), karno = c(20, 50, 80), reference = list(karno = 60))
gridExtra::grid.arrange(p_2d, p_slice, nrow = 1)
Upvotes: 10