Henrich
Henrich

Reputation: 11

How to estimate the median survival with upper and lower confidence limits for the median at 90% confidence levels?

My goal is estimate the median survival with upper and lower confidence limits for the median at 90% confidence levels, using a survfit object.

churn_dat <-read_csv("https://raw.githubusercontent.com/square/pysurvival/master/pysurvival/datasets/churn.csv")
churn_dat <- churn_dat %>% filter(months_active > 0)
#create a function of  the dataframe by sizes
boot <- function(size,n_sims){
#1. filter data into a particular size
df <- churn_dat %>% filter(company_size == size)
n = nrow(df)
#2. run the bootstrap
experiments = tibble(experiment = rep(1:n_sims, each = n),
                     index = sample(1:n, size = n * n_sims, replace = TRUE),
                     time_star = df$months_active[index],
                     event_star = df$churned[index])
return(experiments)
}


#create a function for plotting
plot_boot_data <- function(experiments){
  fit <- survfit(Surv(time_star, event_star) ~ experiment, data = experiments)
  #get the median of surv
  med <- surv_median(fit)
  med <- data.frame(med = med$median)
  ggplot(med , aes(x = med, fill= med)) +
    geom_histogram(binwidth = .8)+theme_bw()
}
df_10to50 <- boot("10-50",10)
plot_boot_data(df_10to50)

I have found the similar function i.e. surv_median() to do it, but the confidence levels is at 95 %

How can i construct the same thing with confidence levels set to 90 %

Upvotes: 1

Views: 1336

Answers (1)

IRTFM
IRTFM

Reputation: 263362

The surv_median-function in pkg:survminer is essentially doing what someone doing a screen scrape of the console would do after running the non-exposed survmean function in pkg:survival. (Note the need for the triple-colon (':::') extraction operator from package survival.) surv_median uses the hard coded column names and so is unable to deal with a fit-object that was constructed with a different value of the conf.int parameter in the results of a call to survfit. If you want the output of the survmean-function from such a call, it's not at all difficult. Using your data:

fit <- survfit(Surv(time_star, event_star) ~ experiment, data = df_10to50, conf.int=0.9)
med <- survival:::survmean(fit,rmean=FALSE)
med  # result is a named list
#------------
$matrix
              records n.max n.start events    rmean se(rmean) median 0.9LCL 0.9UCL
experiment=1      673   673     673    298 7.347565 0.2000873      7      5     12
experiment=2      673   673     673    309 7.152863 0.2028425      6      5     10
experiment=3      673   673     673    298 7.345891 0.2068490      9      5     12
experiment=4      673   673     673    323 7.035011 0.1981676      5      4      7
experiment=5      673   673     673    313 7.044400 0.2074104      6      5      9
experiment=6      673   673     673    317 7.061878 0.2021348      6      4      9
experiment=7      673   673     673    311 7.029602 0.2081835      5      4      9
experiment=8      673   673     673    301 7.345766 0.2032876      9      6     10
experiment=9      673   673     673    318 6.912700 0.2050143      7      5      9
experiment=10     673   673     673    327 6.988065 0.1990601      5      4      7

$end.time
 [1] 12 12 12 12 12 12 12 12 12 12

If you wanted the median and bounds at 0.9 confidence level it could be obtained with:

med$matrix[ 1 , 7:9]  # using numbers instead of column names.
#----------
median 0.9LCL 0.9UCL 
     7      5     12 

I'm afraid there is not a sufficient description of your goal of the process of getting there for me to make sense of the dplyr/magrittr chain of logic, so I'm unable to fill in the proper places in the boot function or the handling of its output by ggplot2. I was initially very confused because you were using a function named boot and I thought you were doing bootstrapped analysis, but there didn't seem to be any mechanism for getting any bootstrapped results, i.e. no randomized selection of rows in an indexable dataset.

If you still wanted to make a purpose-built variant of surv_median you might try modifying this line inside the code:

.table <- .table %>% dplyr::select_(
                       .dots = c("strata", "median", "`0.95LCL`", "`0.95UCL`"))

I wasn't able to figure out what surv_median was doing with the "strata" column since it didn't match the output of survmean, but that probably because it was using summary.survfit rather than going directly to the function that summary.survfit calls to do the calculations. So happy hacking.

Upvotes: 1

Related Questions