laus0204
laus0204

Reputation: 15

tmerge function in R for time dependent covariates

I have the tibbles df1 and df2 and I want to create df_temp from those using dplyr operations. The application is for implementing time-varying covariates in a survival model with delayed entry and start_time is age. Does anyone have a solution using dplyr or tmerge?

library(dplyr)
library(magrittr)
library(survival)


df1 =
  tibble(id = c(1,2,3),
         start_time = c(5,10,15),
         stop_time = c(8,17,25),
         event = c(1,1,0))


df2 = tibble(
         id = c(1,2,3),
         stop_time_cancer = c(6, NA, 20),
         cancer_status = c(1,0,1))


df_temp <- tibble(
  id = c(1,1,2,3,3),
  start_time = c(5,6,10,15,20),
  stop_time = c(6,8,17,20,25), 
  cancer_event = c(0, 1, 0, 0, 1),
  event = c(0,1, 1, 0, 0)
)

Thanks!

I tried doing it using the tmerge function, but since I have delayed entry, I couldn't get it to work.

Upvotes: 0

Views: 283

Answers (1)

r2evans
r2evans

Reputation: 160492

This currently uses fuzzyjoin for the non-equi-join mechanics (required based on my interpretation of the problem-set). When dplyr-1.1.0 is released, this can likely be done with its join_by functionality (ref: https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/#join-improvements).

# library(fuzzyjoin)
out <- fuzzyjoin::fuzzy_left_join(
    df1, df2,
    by = c(id="id", start_time="stop_time_cancer", stop_time="stop_time_cancer"), 
    match_fun = list(`==`, `<=`, `>=`)
  ) %>%
  rowwise() %>%
  summarize(
    id = id.x,
    start_time = c(start_time, na.omit(stop_time_cancer)),
    stop_time = sort(c(na.omit(stop_time_cancer), stop_time)),
    event = c(if (!is.na(stop_time_cancer)) 0, event),
    cancer_event = c(0, if (!is.na(stop_time_cancer)) 1)
  )
out
# # A tibble: 5 × 5
#      id start_time stop_time event cancer_event
#   <dbl>      <dbl>     <dbl> <dbl>        <dbl>
# 1     1          5         6     0            0
# 2     1          6         8     1            1
# 3     2         10        17     1            0
# 4     3         15        20     0            0
# 5     3         20        25     0            1

Verification:

all.equal(df_temp, out[,names(df_temp)])
# [1] TRUE

Upvotes: 3

Related Questions