Reputation: 87
I'm attempting to conduct survival analysis with time-varying covariates. The data comes from a longitudinal survey that is administered annually, and I have manipulated it to look like this:
id event end.time income1 income2 income3 income4
1 1 3 8 10 13 8
2 0 4 13 15 24 35
event indicates whether the event occurred or not, end.time is the time to event, and I have my time-varying covariates for each subsequent period to the right. So, for observation 1, the event occurred at year 3, and during year 1, they earned an income of 8 thousand dollars, etc. For observation 2, the event is censored, and we have data up to year 4 (when the study ends).
In the end, I'd like my data to look something like this:
id st.time end.time event inc
1 0 1 0 8
1 1 2 0 10
1 2 3 1 13
2 0 1 0 13
2 1 2 0 15
2 2 3 0 24
2 3 4 0 35
I've looked up the tmerge() and SurvSplit() functions but am unsure of how to apply them in this specific situation. It seems that with SurvSplit(), I could use the cutpoints by year, but not sure how it would reshape the time-varying covariates.
It might be the case that using a generic reshape might work better?
Any advice would be appreciated.
Upvotes: 0
Views: 743
Reputation: 389065
Probably a general reshape along with some manipulation with dplyr
would work.
library(dplyr)
df %>%
tidyr::pivot_longer(cols = starts_with('income'), values_to = 'inc') %>%
group_by(id) %>%
slice(1:first(end.time)) %>%
mutate(end.time = row_number(),
st.time = end.time - 1,
event = replace(event, -n(), 0)) %>%
select(-name)
# id event end.time inc st.time
# <int> <dbl> <dbl> <int> <dbl>
#1 1 0 1 8 0
#2 1 0 2 10 1
#3 1 1 3 13 2
#4 2 0 1 13 0
#5 2 0 2 15 1
#6 2 0 3 24 2
#7 2 0 4 35 3
data
df <- structure(list(id = 1:2, event = 1:0, end.time = 3:4, income1 = c(8L,
13L), income2 = c(10L, 15L), income3 = c(13L, 24L), income4 = c(8L,
35L)), class = "data.frame", row.names = c(NA, -2L))
Upvotes: 1