user19807620
user19807620

Reputation:

Calculating survival time from the date of hospital addmission

I have one dataset in R where I have data about the date of hospital admission and date of death.

For instance, let's take the code as -

set.seed(1)
tep <- data.frame(Date_of_birth= sample(c("11-12-1987", "11-10-1999", "19-01-1977", "20-12-1950"), 20, T),
       Hospital_admission= sample(c("11-02-2019", "11-03-2019", "10-02-2019", "11-03-2019", "10-03-2019"), 20, T),
       Death_date= sample(c("10-03-2019", "10-06-2019", "12-01-2020", "05-03-2019", "01-02-2020"), 20, T))

I have to calculate the date of death in three different timelines as follows-

  1. I need to calculate people who died 30 days, and I have to make variables with values like 1 and 0. 1= for those who died in 30 days and 0= those who did not.
  2. Everything is similar, here I have to calculate those who died within 6 months of hospital admission
  3. Again, everything is similar, here I have to calculate those who died within one year of hospital admission

Later, I have to make a table together for all these three with percentages.

In last, I need to use the gender variable to make a table with these three.

Can anyone help with this confusion?

Upvotes: 2

Views: 395

Answers (3)

TarJae
TarJae

Reputation: 78927

Here is a slightly different approach compared to the better one of @Gregor Thomas:

library(dplyr)
library(lubridate)
tep %>% 
  mutate(across(everything(), dmy)) %>% 
  mutate(days = interval(Hospital_admission, Death_date) %>% 
           as.numeric('days'),
         months = interval(Hospital_admission, Death_date) %>% 
           as.numeric('months'),
         ) %>% 
  mutate(dead30d = ifelse(days < 30, 1,0),
         dead6M = ifelse(months < 6, 1, 0),
         dead1Y = ifelse(months < 12, 1, 0), .keep="unused")
   Date_of_birth Hospital_admission Death_date dead30d dead6M dead1Y
   <date>        <date>             <date>       <dbl>  <dbl>  <dbl>
 1 1950-12-20    2019-03-10         2020-02-01       0      0      1
 2 1977-01-19    2019-02-10         2020-01-12       0      0      1
 3 1977-01-19    2019-03-11         2019-06-10       0      1      1
 4 1977-01-19    2019-02-10         2019-03-05       1      1      1
 5 1987-12-11    2019-03-11         2019-06-10       0      1      1
 6 1977-01-19    2019-02-10         2020-01-12       0      0      1
 7 1950-12-20    2019-02-11         2020-01-12       0      0      1
 8 1950-12-20    2019-02-10         2019-06-10       0      1      1
 9 1977-01-19    2019-02-10         2019-06-10       0      1      1
10 1950-12-20    2019-03-11         2019-03-05       1      1      1
11 1987-12-11    2019-03-10         2020-02-01       0      0      1
12 1987-12-11    2019-03-10         2019-03-10       1      1      1
13 1950-12-20    2019-02-11         2019-03-10       1      1      1
14 1987-12-11    2019-03-11         2020-02-01       0      0      1
15 1977-01-19    2019-03-11         2020-02-01       0      0      1
16 1987-12-11    2019-02-10         2020-02-01       0      0      1
17 1950-12-20    2019-02-10         2019-06-10       0      1      1
18 1987-12-11    2019-03-11         2019-03-10       1      1      1
19 1977-01-19    2019-03-11         2019-03-10       1      1      1
20 1977-01-19    2019-02-11         2019-03-05       1      1      1

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 173793

When you are looking at 30-day survival, anyone who lives past 30 days is censored at 30 days, that is, you can cap their observation at 30 days. Those who have died within 30 days have reached the endpoint.

Therefore, you need to convert your dates to actual Date objects (currently they are just character strings), work out the time between hospital admission and death, then create a new variable to assign the patients to endpoint (1) or censored (0). Using these, you can create a survival column which is a Surv object:

library(survival)
library(tidyverse)

tep <- tep %>% 
  mutate(across(everything(), lubridate::dmy),
         survival = as.numeric(Death_date - Hospital_admission),
         survival = ifelse(survival < 0, 0, survival),
         endpoint = ifelse(survival > 30, 0, 1),
         survival = ifelse(survival > 30, 30, survival),
         survival = Surv(survival, endpoint)) %>%
  select(-endpoint)

This adds a single new column, survival, which tells us both the follow up time, and whether the patient was censored (shown with a +):

tep
#>    Date_of_birth Hospital_admission Death_date survival
#> 1     1987-12-11         2019-02-11 2019-03-05       22
#> 2     1950-12-20         2019-03-10 2019-03-10        0
#> 3     1977-01-19         2019-03-10 2019-03-10        0
#> 4     1987-12-11         2019-02-11 2019-03-05       22
#> 5     1999-10-11         2019-02-11 2019-03-10       27
#> 6     1987-12-11         2019-03-10 2019-06-10      30+
#> 7     1977-01-19         2019-03-10 2020-01-12      30+
#> 8     1977-01-19         2019-03-11 2019-06-10      30+
#> 9     1999-10-11         2019-03-11 2019-06-10      30+
#> 10    1999-10-11         2019-02-11 2020-02-01      30+
#> 11    1977-01-19         2019-03-11 2019-06-10      30+
#> 12    1977-01-19         2019-02-11 2019-03-10       27
#> 13    1987-12-11         2019-03-11 2020-01-12      30+
#> 14    1987-12-11         2019-02-10 2020-01-12      30+
#> 15    1987-12-11         2019-03-11 2019-03-05        0
#> 16    1999-10-11         2019-03-11 2020-01-12      30+
#> 17    1999-10-11         2019-03-11 2019-03-10        0
#> 18    1999-10-11         2019-03-11 2019-03-05        0
#> 19    1999-10-11         2019-03-11 2020-02-01      30+
#> 20    1977-01-19         2019-03-11 2019-03-10        0

This survival column can then be the basis for many types of survival model. To get an overall look at the survival curve for this data set, for example, we can simply do:

plot(survfit(survival ~ 1, data = tep))

For 6 month survival, the usual way this is handled in the medical literature is to use the approximation of 180 days, so we would simply substitute 180 for 30 in the above code.

Created on 2022-08-22 with reprex v2.0.2

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145765

I'd highly recommend the lubridate package for working with dates.

We'll convert your columns to Date class, and then create your variables:

library(dplyr)
library(lubridate)

tep %>%
  mutate(across(everything(), dmy)) %>%
  mutate(
    dead30d = as.integer(Death_date <= Hospital_admission + days(30)),
    dead6m = as.integer(Death_date <= Hospital_admission + months(6)),
    dead1y = as.integer(Death_date <= Hospital_admission + years(1)),
  )
#    Date_of_birth Hospital_admission Death_date dead30d dead6m dead1y
# 1     1987-12-11         2019-02-11 2019-03-05       1      1      1
# 2     1950-12-20         2019-03-10 2019-03-10       1      1      1
# 3     1977-01-19         2019-03-10 2019-03-10       1      1      1
# 4     1987-12-11         2019-02-11 2019-03-05       1      1      1
# 5     1999-10-11         2019-02-11 2019-03-10       1      1      1
# 6     1987-12-11         2019-03-10 2019-06-10       0      1      1
# 7     1977-01-19         2019-03-10 2020-01-12       0      0      1
# ...

Also note that "6 months" is not as well-defined as other time periods, as months have variable lengths. You might consider using days(180) or days(182) or something for more consistency.

Upvotes: 4

Related Questions