jpsmith
jpsmith

Reputation: 17240

Transforming longitudinal data for time-to-event analysis in R

I am trying to reformat longitudinal data for a time to event analysis. In the example data below, I simply want to find the earliest week that the result was “0” for each ID.

The specific issue I am having is how to patients that don't convert to 0, and had either all 1's or 2's. In the example data, patient J has all 1's.

#Sample data
have<-data.frame(patient=rep(LETTERS[1:10], each=9),
                 week=rep(0:8,times=10),
                 result=c(1,0,2,rep(0,6),1,1,2,1,rep(0,5),1,1,rep(0,7),1,rep(0,8),
                          1,1,1,1,2,1,0,0,0,1,1,1,rep(0,6),1,2,1,rep(0,6),1,2,rep(0,7),
                          1,rep(0,8),rep(1,9)))
patient week result
A       0   1
A       1   0
A       2   2
A       3   0
A       4   0
A       5   0
A       6   0
A       7   0
A       8   0
B       0   1
B       1   0
...     .....
J       6   1
J       7   1
J       8   1

I am able to do this relatively straightforward process with the following code:

want<-aggregate(have$week, by=list(have$patient,have$result), min) want<-want[which(want[2]==0),]

but realize if someone does not convert to 0, it excludes them (in this example, patient J is excluded). Instead, J should be present with a 1 in the second column and an 8 in the third column. Instead it of course is omitted

print(want)
Group.1 Group.2 x
A       0       1
B       0       4
C       0       2
D       0       1
E       0       6
F       0       3
G       0       3
H       0       2
I       0       1
#But also need
J       1       8

Pursuant to guidelines on posting here, I did work to solve this, am able to get what I need very inelegantly:

mins<-aggregate(have$week, by=list(have$patient,have$result), min)
maxs<-aggregate(have$week, by=list(have$patient,have$result), max)
want<-rbind(mins[which(mins[2]==0),],maxs[which(maxs[2]==1&maxs[3]==8),])

This returns the correct desired dataset, but the coding is terrible and not sustainable as I work with other datasets (i.e. datasets with different timeframes since I have to manually put in maxsp[3]==8, etc).

Is there a more elegant or systematic way to approach this data manipulation issue?

Upvotes: 1

Views: 318

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389012

We can write a function to select a row from the group.

select_row <- function(result, week) {
   if(any(result == 0)) which.max(result == 0) else which.max(week)
}

This function returns the index of first 0 value if it is present or else returns index of maximum value of week.

and apply it to all groups.

library(dplyr)
have %>% group_by(patient) %>% slice(select_row(result, week))

#  patient  week result
#   <fct>   <int>  <dbl>
# 1 A           1      0
# 2 B           4      0
# 3 C           2      0
# 4 D           1      0
# 5 E           6      0
# 6 F           3      0
# 7 G           3      0
# 8 H           2      0
# 9 I           1      0
#10 J           8      1

Upvotes: 1

Related Questions