epiguy
epiguy

Reputation: 37

Track status change and date of change in longitudinal data

I am trying to report chronological updates on whether participants in an ongoing study ever suffered car crash injuries. Suppose I am following a growing cohort of food delivery drivers and am tracking the severity of car crash injuries. Injuries for each crash are coded as none, minor, or severe. Any driver with a prior crash injury is excluded from entering the study. All data reflect incidents that occurred after study entry.

For each calendar quarter starting from the earliest observation, I’d like to report the number of drivers who crashed but (1) never sustained an injury; (2) ever sustained only minor injuries; and (3) ever sustained at least 1 severe injury. Assume that each crash results in no more than one injury. Once a driver sustains an injury, the severity of that injury will continue to get reported unless it is replaced by a more severe injury. Each driver should only be represented once per quarter. E.g., if a driver experienced multiple types of crash injuries in the same quarter, only one occurrence of the most severe injury classification should be reported.

Here is my data set:

df <- data.frame(
  DriverID = c('A', 'A', 'A', 'A', 'A',
         'B', 'B', 'B', 'B',
         'C',
         'D', 'D', 'D',
         'E', 'E'),
  study_entry = as.Date(c('2020-01-13', NA, NA, NA, NA,
                    '2020-03-02', NA, NA, NA,
                    '2020-04-15',
                    '2020-10-08', NA, NA,
                    '2020-07-20', NA)),
  crash_date = as.Date(c(NA, '2020-03-03', '2020-05-01', '2020-07-28', '2020-11-20',
                    NA, '2020-07-10', '2020-08-19', '2020-12-25',
                    NA,
                    NA, '2020-10-10', '2020-11-23',
                    NA, '2020-09-30')),
  crash_severity = c(NA, 'N', 'M', 'S', 'M', 
               NA, 'M', 'S', 'N',
               NA,
               NA, 'S', 'M',
               NA, 'N')
  )

Visually, the data might look like:

 |--Y1Q1--|--Y1Q2--|--Y1Q3--|--Y1Q4--|
A *    N     M       S            M
B      *            M   S           N
C              *        
D                            * S  M
E                    *         N

*: Entry into study
N: Non-injury crash
M: Minor-injury crash
S: Severe-injury crash

I’d like to be able to create the table below. Note that for each reporting period, the severity of future injuries is not reflected.

Period  Drivers  N   M   S  
Y1Q1      2      1   0   0
Y1Q2      3      0   1   0
Y1Q3      4      0   0   2
Y1Q4      5      1   0   3

Upvotes: 0

Views: 84

Answers (1)

mikebader
mikebader

Reputation: 1299

Here is the solution that I came up with using tidyverse.

library(tidyverse)
firstdates <- df %>%
    group_by(DriverID, crash_severity) %>%
    arrange(crash_date) %>%
    slice_min(crash_date) %>%
    filter(!is.na(crash_date)) %>%
    pivot_wider(id_cols = "DriverID", names_from = "crash_severity", values_from = "crash_date")

quarters <- tribble(
    ~quarter, ~enddate,
    "Y1Q1", "2020/03/31",
    "Y1Q2", "2020/06/30",
    "Y1Q3", "2020/09/30",
    "Y1Q4", "2020/12/31"
) %>%
    mutate(enddate = as.Date(enddate))

df3 <- df %>%
    select(DriverID, study_entry) %>%
    group_by(DriverID) %>%
    slice(rep(1, nrow(quarters))) %>%
    mutate(!!!quarters) %>%
    filter(study_entry < enddate) %>%
    ungroup() %>%
    left_join(firstdates, by = "DriverID") %>%
    mutate(
        crash = if_else(N < enddate, "N", "X"),
        crash = if_else(M < enddate, "M", crash),
        crash = if_else(S < enddate, "S", crash),
        crash = replace_na(crash, "X")
    ) %>%
    group_by(quarter, crash) %>%
    count() %>%
    pivot_wider(id_cols = "quarter", names_from = "crash", values_from = "n") %>%
    mutate(across(matches("[SMNX]"), ~replace_na(., 0))) %>%
    mutate(drivers = rowSums(across(matches("[SMNX]"))))
df3
# A tibble: 4 x 6
# Groups:   quarter [4]
  quarter     N     X     M     S drivers
  <chr>   <dbl> <dbl> <dbl> <dbl>   <dbl>
1 Y1Q1        1     1     0     0       2
2 Y1Q2        0     2     1     0       3
3 Y1Q3        0     2     0     2       4
4 Y1Q4        0     2     0     3       5

The firstdates tibble contains a column for each level of severity with values equal to the date of the first crash of that severity.

We then create containing all of the quarters in the study with the last day of each quarter and repeat that dataset for every driver. We keep rows where the end date of the quarter came after the driver's entry into the study. We then merge the data on the first date of each type of crash and create the variable crash. The variable contains a letter indicating the most severe incident that occurred before the end date of that quarter with "X" standing for no crash having occurred.

After that we summarize the data by crash type and sum across the columns to calculate the number of drivers in the study (note the lack of an "M" column reflects the fact that all moderate crashes occurred after the driver already experienced a severe crash).

Upvotes: 1

Related Questions