David
David

Reputation: 11

R Flexsurv and time-dependent covariates

I read that the R flexsurv package can also be used for modeling time-dependent covariates according to Christopher Jackson (2016) ["flexsurv: a platform for parametric survival modeling in R, Journal of Statistical Software, 70 (1)].

However, I was not able to figure out how, even after several adjustments and searches in online forums.

Before turning to the estimation of time-dependent covariates I tried to create a simple model with only time-independent covariates to test whether I specified the Surv object correctly. Here is a small example.

library(splitstackshape)
library(flexsurv)

## create sample data
n=50
set.seed(2)
t <- rpois(n,15)+1
x <- rnorm(n,t,5)
df <- data.frame(t,x)
df$id <- 1:n
df$rep <- df$t-1

Which looks like this:

  t         x id rep
1 12 17.696149  1  11
2 12 20.358094  2  11
3 11  2.058789  3  10
4 16 26.156213  4  15
5 13  9.484278  5  12
6 15 15.790824  6  14
...

And the long data:

long.df <- expandRows(df, "rep")

rep.vec<-c()
for(i in 1:n){
  rep.vec <- c(rep.vec,1:(df[i,"t"]-1))
}

long.df$start <- rep.vec 
long.df$stop  <- rep.vec +1
long.df$censrec <- 0
long.df$censrec<-ifelse(long.df$stop==long.df$t,1,long.df$censrec)

Which looks like this:

     t         x id start stop censrec
1    12 17.69615  1     1    2       0
1.1  12 17.69615  1     2    3       0
1.2  12 17.69615  1     3    4       0
1.3  12 17.69615  1     4    5       0
1.4  12 17.69615  1     5    6       0
1.5  12 17.69615  1     6    7       0
1.6  12 17.69615  1     7    8       0
1.7  12 17.69615  1     8    9       0
1.8  12 17.69615  1     9   10       0
1.9  12 17.69615  1    10   11       0
1.10 12 17.69615  1    11   12       1
2    12 20.35809  2     1    2       0

...

Now I can estimate a simple Cox model to see whether it works:

coxph(Surv(t)~x,data=df)

This yields:

    coef exp(coef) se(coef)     z     p
x -0.0588    0.9429   0.0260 -2.26 0.024

And in the long format:

coxph(Surv(start,stop,censrec)~x,data=long.df)

I get:

     coef exp(coef) se(coef)     z     p
x -0.0588    0.9429   0.0260 -2.26 0.024

Taken together I conclude that my transformation into the long format was correct. Now, turning to the flexsurv framework:

flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")

yields:

Estimates: 
       data mean  est       L95%      U95%      se        exp(est)  L95%      U95%    
shape        NA    5.00086   4.05569   6.16631   0.53452        NA        NA        NA
scale        NA   13.17215  11.27876  15.38338   1.04293        NA        NA        NA
x      15.13380    0.01522   0.00567   0.02477   0.00487   1.01534   1.00569   1.02508

But

flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull")

causes an error:

Error in flexsurvreg(Surv(start, stop, censrec) ~ x, data = long.df, dist = "weibull") : 
  Initial value for parameter 1 out of range

Would anyone happen to know the correct syntax for the latter Surv object? If you use the correct syntax, do you get the same estimates?

Thank you very much, best, David

===============

EDIT AFTER FEEDBACK FROM 42

===============

library(splitstackshape)
library(flexsurv)
x<-c(8.136527,  7.626712,  9.809122, 12.125973, 12.031536, 11.238394,  4.208863,  8.809854,  9.723636)
t<-c(2, 3, 13,  5,  7, 37 ,37,  9,  4)

df <- data.frame(t,x)

#transform into long format for time-dependent covariates
df$id <- 1:length(df$t)
df$rep <- df$t-1
long.df <- expandRows(df, "rep")

rep.vec<-c()
for(i in 1:length(df$t)){
    rep.vec <- c(rep.vec,1:(df[i,"t"]-1))
}

long.df$start <- rep.vec 
long.df$stop  <- rep.vec +1
long.df$censrec <- 0
long.df$censrec<-ifelse(long.df$stop==long.df$t,1,long.df$censrec)


coxph(Surv(t)~x,data=df)
coxph(Surv(start,stop,censrec)~x,data=long.df)

flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")
flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull",inits=c(shape=.1, scale=1))

Which yields the same estimates for both coxph models but

Call:
flexsurvreg(formula = Surv(time = t) ~ x, data = df, dist = "weibull")

Estimates: 
       data mean  est       L95%      U95%      se        exp(est)  L95%      U95%    
shape        NA     1.0783    0.6608    1.7594    0.2694        NA        NA        NA
scale        NA    27.7731    3.5548  216.9901   29.1309        NA        NA        NA
x        9.3012    -0.0813   -0.2922    0.1295    0.1076    0.9219    0.7466    1.1383

N = 9,  Events: 9,  Censored: 0
Total time at risk: 117
Log-likelihood = -31.77307, df = 3
AIC = 69.54614

and

Call:
flexsurvreg(formula = Surv(start, stop, censrec) ~ x, data = long.df, 
    dist = "weibull", inits = c(shape = 0.1, scale = 1))

Estimates: 
       data mean  est       L95%      U95%      se        exp(est)  L95%      U95%    
shape        NA     0.8660    0.4054    1.8498    0.3353        NA        NA        NA
scale        NA    24.0596    1.7628  328.3853   32.0840        NA        NA        NA
x        8.4958    -0.0912   -0.3563    0.1739    0.1353    0.9128    0.7003    1.1899

N = 108,  Events: 9,  Censored: 99
Total time at risk: 108
Log-likelihood = -30.97986, df = 3
AIC = 67.95973

Upvotes: 1

Views: 1639

Answers (2)

Kate
Kate

Reputation: 349

I just want to mention that in flexsurv v1.1.1, running this code:

flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull")

doesn't return any errors. It also gives the same estimates as the non time-varying command

flexsurvreg(Surv(time=t)~x,data=df, dist="weibull")

enter image description here

Upvotes: 0

IRTFM
IRTFM

Reputation: 263342

Reading the error message:

Error in flexsurvreg(Surv(start, stop, censrec) ~ x, data = long.df, dist = "weibull", : initial values must be a numeric vector

And then reading the help page, ?flexsurvreg, it seemed as though an attempt at setting values for inits to a named numeric vector should be attempted:

flexsurvreg(Surv(start,stop,censrec) ~ x ,data=long.df, dist="weibull", inits=c(shape=.1, scale=1))
Call:
flexsurvreg(formula = Surv(start, stop, censrec) ~ x, data = long.df, 
    dist = "weibull", inits = c(shape = 0.1, scale = 1))

Estimates: 
       data mean  est       L95%      U95%      se        exp(est)  L95%      U95%    
shape        NA    5.00082   4.05560   6.16633   0.53454        NA        NA        NA
scale        NA   13.17213  11.27871  15.38341   1.04294        NA        NA        NA
x      15.66145    0.01522   0.00567   0.02477   0.00487   1.01534   1.00569   1.02508

N = 715,  Events: 50,  Censored: 665
Total time at risk: 715
Log-likelihood = -131.5721, df = 3
AIC = 269.1443

Extremely similar results. My guess was basically a stab in the dark, so I have no guidance on how to make a choice if this had not succeeded other than to "expand the search."

Upvotes: 1

Related Questions