R. Simian
R. Simian

Reputation: 187

Cumulative incidence plot and table in R

Context: I have a dataset with patients who had liver transplant (some of them are in treatment group A, others in treatment group B, depending on the immunosuppressive med they’re on). Due to the transplant, these patients are at high risk for developing donor derived HBV infections.

What’s needed: The investigator is interested in time until infection starts (first instance of HBV) and the proportion that develop the infection over time. They also want the cumulative incidence of HBV infection at baseline, and at each of the post-transplant follow-up timepoints (6 months, 12 months, 18 months and 24 months) for group A and group B. For example, the 6-months data would be the proportion of those patients with 6-months follow-up who ever had HBV, the 12-months data would the proportion of those with 12-months data follow-up who ever had HBV and so on.

Cumulative incidence in this specific case just means 1 minus the survival function, without accounting for any competing risks. The analysis population has no deaths or loss to follow up.

My questions are:

  1. Any way to add the number of events right under number at risk on the plot?
  2. Any way to also obtain the cumulative incidence numbers at each time point, for each group with the std.err and 95%CI, similar to the life tables we get when we use summary(km) below? these life tables give me the survival probabilities so I guess if I want cumulative incidence, I could just manually do 1-survival probability but not sure how to obtain the std.err and confidence intervals?

Below is a test dataset similar to the actual one, and what I have done so far:

time<-c(1.5989,6.9433, 0.8890, 3.2691, 1.0514, 2.7625, 1.4319, 0.9681, 7.4416, 0.0268, 1.5168, 1.9647, 0.0657, 4.3571, 6.4490, 0.2198, 1.2028, 0.9555, 0.2601, 2.0096, 7.5156, 0.4463, 0.2355, 0.9391, 2.6996)

censor<-c(1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)

group<-c(1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1)

df<-data.frame(ID, time, censor, group)
View(df)

km<-survfit(formula = Surv(time, censor) ~ group, data = df)
summary(km)

#cumulative incidence plot
plot(km, fun = function(x) 1-x)

#log rank test;
survdiff(Surv(time, censor) ~ group, data=df)
#plot survival curves for each treatment group
#cumulative incidence plot 
ggsurvplot(km,
           data = df,
           censor = T,
           risk.table = TRUE,
           legend.labs = c("group 1", "group 2"),
           xlim = c(0,10),
           ylim = c(0,1),
           pval = T,
           pval.method = T,
           pval.method.coord = c(2.5,0.5),
           pval.coord = c(4.2,0.5),
           xlab = "Months",
           ylab = "SURVIVAL PROBABILITY",
           linetype = c(1,2),
           legend.title = "",
           palette = c('red', 'blue'),
           fun="event"
) 

Upvotes: 1

Views: 4641

Answers (1)

Behnam Hedayat
Behnam Hedayat

Reputation: 857

You can create custom table then remove ggsurvplot default table and replace it with your custom table.

As stated in A note on competing risks in survival data analysis article in British Journal of Cancer about cumulative incidence of mortality:

This is simply the converse of survival. In other words, the cumulative incidence of an event at a given time is one minus the overall survival probability at that time.

So for cumulative incidence of events as you said we can use 1-survival_probalility, Therefore their std.err is the same, as their confidence interval (albiet lower ci as upper ci and upper ci as lower ci).

In custom plot here, I added number of risk (first row), cumulatie number of events(second row), survival probability(third row) and its confidence intervals (fourth row)

libs <- c("survminer", "survival", "tidyverse")
invisible(suppressMessages(sapply(libs,library,character.only=T)))

time <- c(1.5989,6.9433, 0.8890, 3.2691, 1.0514, 2.7625, 1.4319, 0.9681, 7.4416, 
          0.0268, 1.5168, 1.9647, 0.0657, 4.3571, 6.4490, 0.2198, 1.2028, 0.9555, 
          0.2601, 2.0096, 7.5156, 0.4463, 0.2355, 0.9391, 2.6996)

censor <- c(1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)

group <- c(1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1)

df <- data.frame(time, censor, group)

km <- survfit(formula = Surv(time, censor) ~ group, data = df)

#Draw cumulative incidence plot as usual
p1 <- ggsurvplot(km,
           data = df,
           censor = T,
           risk.table = T,
           legend.labs = c("group 1", "group 2"),
           xlim = c(0,7.5),
           ylim = c(0,1),
           pval = T,
           pval.method = T,
           pval.method.coord = c(2.5,0.5),
           pval.coord = c(4.2,0.5),
           xlab = "Months",
           ylab = "SURVIVAL PROBABILITY",
           linetype = c(1,2),
           legend.title = "",
           palette = c('red', 'blue'),
           fun="event",
           cumevents = F, 
           tables.height = 0.4
) 

# extract data table of survival plot

p1_df <- p1$table$data
p1_df
#>     strata time n.risk pct.risk n.event cum.n.event n.censor cum.n.censor
#> 1  group 1    0     14      100       0           0        0            0
#> 2  group 1    2      4       29      10          10        0            0
#> 3  group 1    4      2       14       0          10        2            2
#> 4  group 1    6      1        7       0          10        1            3
#> 5  group 1    8      0        0       0          10        1            4
#> 6  group 2    0     11      100       0           0        0            0
#> 7  group 2    2      5       45       6           6        0            0
#> 8  group 2    4      3       27       0           6        2            2
#> 9  group 2    6      3       27       0           6        0            2
#> 10 group 2    8      0        0       0           6        3            5
#>    strata_size group llabels
#> 1           14     1      14
#> 2           14     1       4
#> 3           14     1       2
#> 4           14     1       1
#> 5           14     1       0
#> 6           11     2      11
#> 7           11     2       5
#> 8           11     2       3
#> 9           11     2       3
#> 10          11     2       0


# use summary of km survfit in time points of survival plot

sum_df <-summary(km, times = p1_df$time) 
sum_df
#> Call: survfit(formula = Surv(time, censor) ~ group, data = df)
#> 
#>                 group=1 
#>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#>     0     14       0    1.000   0.000        1.000        1.000
#>     0     14       0    1.000   0.000        1.000        1.000
#>     2      4      10    0.286   0.121        0.125        0.654
#>     2      4       0    0.286   0.121        0.125        0.654
#>     4      2       0    0.286   0.121        0.125        0.654
#>     4      2       0    0.286   0.121        0.125        0.654
#>     6      1       0    0.286   0.121        0.125        0.654
#>     6      1       0    0.286   0.121        0.125        0.654
#> 
#>                 group=2 
#>  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#>     0     11       0    1.000    0.00        1.000        1.000
#>     0     11       0    1.000    0.00        1.000        1.000
#>     2      5       6    0.455    0.15        0.238        0.868
#>     2      5       0    0.455    0.15        0.238        0.868
#>     4      3       0    0.455    0.15        0.238        0.868
#>     4      3       0    0.455    0.15        0.238        0.868
#>     6      3       0    0.455    0.15        0.238        0.868
#>     6      3       0    0.455    0.15        0.238        0.868
str(sum_df)
#> List of 19
#>  $ n            : int [1:2] 14 11
#>  $ time         : num [1:16] 0 0 2 2 4 4 6 6 0 0 ...
#>  $ n.risk       : num [1:16] 14 14 4 4 2 2 1 1 11 11 ...
#>  $ n.event      : num [1:16] 0 0 10 0 0 0 0 0 0 0 ...
#>  $ n.censor     : num [1:16] 0 0 0 0 2 0 1 0 0 0 ...
#>  $ surv         : num [1:16] 1 1 0.286 0.286 0.286 ...
#>  $ std.err      : num [1:16] 0 0 0.121 0.121 0.121 ...
#>  $ cumhaz       : num [1:16] 0 0 1.17 1.17 1.17 ...
#>  $ std.chaz     : num [1:16] 0 0 0.39 0.39 0.39 ...
#>  $ strata       : Factor w/ 2 levels "group=1","group=2": 1 1 1 1 1 1 1 1 2 2 ...
#>  $ type         : chr "right"
#>  $ logse        : logi TRUE
#>  $ conf.int     : num 0.95
#>  $ conf.type    : chr "log"
#>  $ lower        : num [1:16] 1 1 0.125 0.125 0.125 ...
#>  $ upper        : num [1:16] 1 1 0.654 0.654 0.654 ...
#>  $ call         : language survfit(formula = Surv(time, censor) ~ group, data = df)
#>  $ table        : num [1:2, 1:9] 14 11 14 11 14 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "group=1" "group=2"
#>   .. ..$ : chr [1:9] "records" "n.max" "n.start" "events" ...
#>  $ rmean.endtime: num [1:2] 7.52 7.52
#>  - attr(*, "class")= chr "summary.survfit"


# extract our columns of interest from `sum_df` dataframe

columns <- lapply(c(2:10,15:16), function(x) sum_df[x])
tbl <- do.call(data.frame, columns)
tbl
#>    time n.risk n.event n.censor      surv   std.err   cumhaz  std.chaz  strata
#> 1     0     14       0        0 1.0000000 0.0000000 0.000000 0.0000000 group=1
#> 2     0     14       0        0 1.0000000 0.0000000 0.000000 0.0000000 group=1
#> 3     2      4      10        0 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 4     2      4       0        0 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 5     4      2       0        2 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 6     4      2       0        0 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 7     6      1       0        1 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 8     6      1       0        0 0.2857143 0.1207363 1.168229 0.3903649 group=1
#> 9     0     11       0        0 1.0000000 0.0000000 0.000000 0.0000000 group=2
#> 10    0     11       0        0 1.0000000 0.0000000 0.000000 0.0000000 group=2
#> 11    2      5       6        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 12    2      5       0        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 13    4      3       0        2 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 14    4      3       0        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 15    6      3       0        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#> 16    6      3       0        0 0.4545455 0.1501314 0.736544 0.3072801 group=2
#>        lower     upper
#> 1  1.0000000 1.0000000
#> 2  1.0000000 1.0000000
#> 3  0.1248055 0.6540791
#> 4  0.1248055 0.6540791
#> 5  0.1248055 0.6540791
#> 6  0.1248055 0.6540791
#> 7  0.1248055 0.6540791
#> 8  0.1248055 0.6540791
#> 9  1.0000000 1.0000000
#> 10 1.0000000 1.0000000
#> 11 0.2379221 0.8684002
#> 12 0.2379221 0.8684002
#> 13 0.2379221 0.8684002
#> 14 0.2379221 0.8684002
#> 15 0.2379221 0.8684002
#> 16 0.2379221 0.8684002

# create custom table of our metrics and parameters of interests
custom_tbl <- data.frame(time=tbl$time, 
                         n.risk=tbl$n.risk, 
                         n.event=tbl$n.event,
                         cum.inc=1-tbl$surv, 
                         std.err=1-tbl$std.err,
                         ci=paste0(round((1-tbl$upper),2),"-",round((1-tbl$lower),2)),
                         strata=tbl$strata) %>% 
  mutate_if(is.numeric, round, digits=2)


# calculating cumulative incidence from n.event column, then remove duplicated rows and 
# replace them by last similar column

custom_tbl %<>% group_by(strata) %>% mutate(cum.n.event = cumsum(n.event)) %>% 
  group_by(time, strata) %>% 
  slice_tail(n=1)

custom_tbl
#> # A tibble: 8 × 8
#> # Groups:   time, strata [8]
#>    time n.risk n.event cum.inc std.err ci        strata  cum.n.event
#>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl> <chr>     <fct>         <dbl>
#> 1     0     14       0    0       1    0-0       group=1           0
#> 2     0     11       0    0       1    0-0       group=2           0
#> 3     2      4       0    0.71    0.88 0.35-0.88 group=1          10
#> 4     2      5       0    0.55    0.85 0.13-0.76 group=2           6
#> 5     4      2       0    0.71    0.88 0.35-0.88 group=1          10
#> 6     4      3       0    0.55    0.85 0.13-0.76 group=2           6
#> 7     6      1       0    0.71    0.88 0.35-0.88 group=1          10
#> 8     6      3       0    0.55    0.85 0.13-0.76 group=2           6

# removing default risk table of our survival plot
p1$table <- NULL

# creating a custom ggplot by our custom table
p1$table <- ggplot(custom_tbl, aes(time, strata)) +
  geom_text(aes(label = n.risk), size = 3, vjust=-0.8) +
  geom_text(aes(label = cum.n.event), size = 3, vjust=1) + 
  geom_text(aes(label = cum.inc), size = 3, vjust=2.8) + 
  geom_text(aes(label = ci), size = 3, vjust=4) + 
  labs(x = NULL, y = NULL) + theme_classic()+
  coord_cartesian(xlim = c(0, 7.5))+
  scale_x_continuous(breaks=seq(0, 9, 2))+
  scale_y_discrete(
    labels=c("group 1", "group 2"))+
  theme(axis.text.y = element_text(colour=c("red", "blue"), size = 12),
        axis.text.x = element_text(size = 12))
#> Warning: Vectorized input to `element_text()` is not officially supported.
#> ℹ Results may be unexpected or may change in future versions of ggplot2.

p1

Created on 2023-02-13 with reprex v2.0.2

Upvotes: 3

Related Questions