Reputation:
I have one dataset in R where I have data about the date of hospital admission and date of death.
For instance, let's take the code as -
set.seed(1)
tep <- data.frame(Date_of_birth= sample(c("11-12-1987", "11-10-1999", "19-01-1977", "20-12-1950"), 20, T),
Hospital_admission= sample(c("11-02-2019", "11-03-2019", "10-02-2019", "11-03-2019", "10-03-2019"), 20, T),
Death_date= sample(c("10-03-2019", "10-06-2019", "12-01-2020", "05-03-2019", "01-02-2020"), 20, T))
I have to calculate the date of death in three different timelines as follows-
Later, I have to make a table together for all these three with percentages.
In last, I need to use the gender variable to make a table with these three.
Can anyone help with this confusion?
Upvotes: 2
Views: 395
Reputation: 78927
Here is a slightly different approach compared to the better one of @Gregor Thomas:
library(dplyr)
library(lubridate)
tep %>%
mutate(across(everything(), dmy)) %>%
mutate(days = interval(Hospital_admission, Death_date) %>%
as.numeric('days'),
months = interval(Hospital_admission, Death_date) %>%
as.numeric('months'),
) %>%
mutate(dead30d = ifelse(days < 30, 1,0),
dead6M = ifelse(months < 6, 1, 0),
dead1Y = ifelse(months < 12, 1, 0), .keep="unused")
Date_of_birth Hospital_admission Death_date dead30d dead6M dead1Y
<date> <date> <date> <dbl> <dbl> <dbl>
1 1950-12-20 2019-03-10 2020-02-01 0 0 1
2 1977-01-19 2019-02-10 2020-01-12 0 0 1
3 1977-01-19 2019-03-11 2019-06-10 0 1 1
4 1977-01-19 2019-02-10 2019-03-05 1 1 1
5 1987-12-11 2019-03-11 2019-06-10 0 1 1
6 1977-01-19 2019-02-10 2020-01-12 0 0 1
7 1950-12-20 2019-02-11 2020-01-12 0 0 1
8 1950-12-20 2019-02-10 2019-06-10 0 1 1
9 1977-01-19 2019-02-10 2019-06-10 0 1 1
10 1950-12-20 2019-03-11 2019-03-05 1 1 1
11 1987-12-11 2019-03-10 2020-02-01 0 0 1
12 1987-12-11 2019-03-10 2019-03-10 1 1 1
13 1950-12-20 2019-02-11 2019-03-10 1 1 1
14 1987-12-11 2019-03-11 2020-02-01 0 0 1
15 1977-01-19 2019-03-11 2020-02-01 0 0 1
16 1987-12-11 2019-02-10 2020-02-01 0 0 1
17 1950-12-20 2019-02-10 2019-06-10 0 1 1
18 1987-12-11 2019-03-11 2019-03-10 1 1 1
19 1977-01-19 2019-03-11 2019-03-10 1 1 1
20 1977-01-19 2019-02-11 2019-03-05 1 1 1
Upvotes: 0
Reputation: 173793
When you are looking at 30-day survival, anyone who lives past 30 days is censored at 30 days, that is, you can cap their observation at 30 days. Those who have died within 30 days have reached the endpoint.
Therefore, you need to convert your dates to actual Date
objects (currently they are just character strings), work out the time between hospital admission and death, then create a new variable to assign the patients to endpoint (1) or censored (0). Using these, you can create a survival
column which is a Surv
object:
library(survival)
library(tidyverse)
tep <- tep %>%
mutate(across(everything(), lubridate::dmy),
survival = as.numeric(Death_date - Hospital_admission),
survival = ifelse(survival < 0, 0, survival),
endpoint = ifelse(survival > 30, 0, 1),
survival = ifelse(survival > 30, 30, survival),
survival = Surv(survival, endpoint)) %>%
select(-endpoint)
This adds a single new column, survival
, which tells us both the follow up time, and whether the patient was censored (shown with a +):
tep
#> Date_of_birth Hospital_admission Death_date survival
#> 1 1987-12-11 2019-02-11 2019-03-05 22
#> 2 1950-12-20 2019-03-10 2019-03-10 0
#> 3 1977-01-19 2019-03-10 2019-03-10 0
#> 4 1987-12-11 2019-02-11 2019-03-05 22
#> 5 1999-10-11 2019-02-11 2019-03-10 27
#> 6 1987-12-11 2019-03-10 2019-06-10 30+
#> 7 1977-01-19 2019-03-10 2020-01-12 30+
#> 8 1977-01-19 2019-03-11 2019-06-10 30+
#> 9 1999-10-11 2019-03-11 2019-06-10 30+
#> 10 1999-10-11 2019-02-11 2020-02-01 30+
#> 11 1977-01-19 2019-03-11 2019-06-10 30+
#> 12 1977-01-19 2019-02-11 2019-03-10 27
#> 13 1987-12-11 2019-03-11 2020-01-12 30+
#> 14 1987-12-11 2019-02-10 2020-01-12 30+
#> 15 1987-12-11 2019-03-11 2019-03-05 0
#> 16 1999-10-11 2019-03-11 2020-01-12 30+
#> 17 1999-10-11 2019-03-11 2019-03-10 0
#> 18 1999-10-11 2019-03-11 2019-03-05 0
#> 19 1999-10-11 2019-03-11 2020-02-01 30+
#> 20 1977-01-19 2019-03-11 2019-03-10 0
This survival column can then be the basis for many types of survival model. To get an overall look at the survival curve for this data set, for example, we can simply do:
plot(survfit(survival ~ 1, data = tep))
For 6 month survival, the usual way this is handled in the medical literature is to use the approximation of 180 days, so we would simply substitute 180 for 30 in the above code.
Created on 2022-08-22 with reprex v2.0.2
Upvotes: 1
Reputation: 145765
I'd highly recommend the lubridate
package for working with dates.
We'll convert your columns to Date
class, and then create your variables:
library(dplyr)
library(lubridate)
tep %>%
mutate(across(everything(), dmy)) %>%
mutate(
dead30d = as.integer(Death_date <= Hospital_admission + days(30)),
dead6m = as.integer(Death_date <= Hospital_admission + months(6)),
dead1y = as.integer(Death_date <= Hospital_admission + years(1)),
)
# Date_of_birth Hospital_admission Death_date dead30d dead6m dead1y
# 1 1987-12-11 2019-02-11 2019-03-05 1 1 1
# 2 1950-12-20 2019-03-10 2019-03-10 1 1 1
# 3 1977-01-19 2019-03-10 2019-03-10 1 1 1
# 4 1987-12-11 2019-02-11 2019-03-05 1 1 1
# 5 1999-10-11 2019-02-11 2019-03-10 1 1 1
# 6 1987-12-11 2019-03-10 2019-06-10 0 1 1
# 7 1977-01-19 2019-03-10 2020-01-12 0 0 1
# ...
Also note that "6 months" is not as well-defined as other time periods, as months have variable lengths. You might consider using days(180)
or days(182)
or something for more consistency.
Upvotes: 4