R - for every participant, populate subsequent entries with the date of recruitment (clinical data)

Question

I am having trouble the data below:

myDF <- structure(list(id = c("J13/5", "J13/5", "J13/5", "J13/5", "J13/5", 
   "J138/2", "J138/2", "J138/2", "J140/6", "J140/6", "J140/6", "J155/9", 
   "J155/9", "J177/4", "J193/2", "J197/5", "J197/5", "J197/5", "J197/5", 
   "J197/5", "J197/5", "J199/1", "J199/1", "J199/1", "J199/1", "J204/3", 
   "J230/1", "J25/5", "J25/9", "J25/9", "J25/9", "J259/2", "J261/2", 
   "J289/1", "J293/4", "J293/4", "J293/4", "J293/4", "J293/4", "J293/4", 
   "J293/4", "J298/7", "J298/7", "J298/7", "J298/7", "J30/8", "J30/8", 
   "J30/8", "J306/4", "J312/4"), 
sample_type = c("A", "C1", "C2", 
     "C3", "A", "A1", "C1", "C2", "A", "C1", "C2", "A", "A", "A", 
     "C2", "AS", "A", "C1", "C2", "C3", "AS", "AS", "C1", "C2", "C3", 
     "A", "A", "C3", "A", "C1", "C2", "AS", "A", "A", "A", "A1", "AC1", 
     "C1", "C2", "A", "C3", "A", "C1", "C2", "AC3", "A1", "C1", "C2", 
     "A", "A"), 
sample_date = c("09-Sep-16", "06-Oct-16", "21-Oct-16", 
    "22-Feb-17", "03-Jul-17", "08-Aug-16", "05-Sep-16", "19-Sep-16", 
    "31-Aug-16", "29-Sep-16", "10-Oct-16", "26-Jan-16", "15-Sep-17", 
    "25-Jan-16", "19-Sep-16", "22-Mar-16", "18-Jul-16", "15-Aug-16", 
    "30-Aug-16", "05-Jan-17", "03-Apr-17", "18-Mar-16", "19-Aug-16", 
    "02-Sep-16", "05-Jan-17", "03-Apr-17", "21-Mar-16", "15-Mar-17", 
    "30-Sep-16", "18-Oct-16", "10-Nov-16", "18-Mar-16", "04-Jul-17", 
    "04-Sep-17", "28-Jan-16", "21-Jul-16", "05-Aug-16", "18-Aug-16", 
    "31-Aug-16", "15-Sep-16", "04-Jan-17", "11-Jul-16", "08-Aug-16", 
    "22-Aug-16", "27-Sep-16", "28-Jul-16", "24-Aug-16", "07-Sep-16", 
    "03-Feb-16", "20-Jan-17")), 
.Names = c("id", "sample_type", "sample_date"
    ), row.names = c(NA, -50L), class = c("data.table", "data.frame"
    ))

This data comes from a clinical trial where over a two year period, children are recruited with an acute episode (sample type = A) and are followed up over three subsequent timepoints (sample type = C1 or C2 or C3). All these have been defined differently depending on what a child presents with. Anything that falls outside means that the child presented with something else.

One could, therefore, say that a complete follow-up starts with A and ends with C3 and when a child completes this follow-up, they start all over again.

I want to add another variable (date_acute) that for every follow-up, is populated with the date when the child presented with an acute episode. For example, picking a few of the children, we expect the output to be as below:

id  sample_type sample_date date_acute
child13/5   A   09-Sep-16   09-Sep-16
child13/5   C1  06-Oct-16   09-Sep-16
child13/5   C2  21-Oct-16   09-Sep-16
child13/5   C3  22-Feb-17   09-Sep-16
child13/5   A   03-Jul-17   03-Jul-17
child138/2  A1  08-Aug-16   NA
child138/2  C1  05-Sep-16   NA
child138/2  C2  19-Sep-16   NA
child140/6  A   31-Aug-16   31-Aug-16
child140/6  C1  29-Sep-16   31-Aug-16
child140/6  C2  10-Oct-16   31-Aug-16
child155/9  A   26-Jan-16   26-Jan-16
child155/9  A   15-Sep-17   15-Sep-17
child177/4  A   25-Jan-16   25-Jan-16

User2321 · Accepted Answer

One approach could be the following:

library(data.table)

setDT(yDF)
yDF[, date_acute := sample_date[sample_type == "A"], by = .(id, cumsum(sample_type == "A" ))]

  > yDF
        id sample_type sample_date date_acute
 1:  J13/5           A   09-Sep-16  09-Sep-16
 2:  J13/5          C1   06-Oct-16  09-Sep-16
 3:  J13/5          C2   21-Oct-16  09-Sep-16
 4:  J13/5          C3   22-Feb-17  09-Sep-16
 5:  J13/5           A   03-Jul-17  03-Jul-17
 6: J138/2          A1   08-Aug-16         NA
 7: J138/2          C1   05-Sep-16         NA
 8: J138/2          C2   19-Sep-16         NA
 9: J140/6           A   31-Aug-16  31-Aug-16
10: J140/6          C1   29-Sep-16  31-Aug-16
11: J140/6          C2   10-Oct-16  31-Aug-16
12: J155/9           A   26-Jan-16  26-Jan-16
13: J155/9           A   15-Sep-17  15-Sep-17
14: J177/4           A   25-Jan-16  25-Jan-16
15: J193/2          C2   19-Sep-16         NA
16: J197/5          AS   22-Mar-16         NA
17: J197/5           A   18-Jul-16  18-Jul-16
18: J197/5          C1   15-Aug-16  18-Jul-16
19: J197/5          C2   30-Aug-16  18-Jul-16
20: J197/5          C3   05-Jan-17  18-Jul-16
21: J197/5          AS   03-Apr-17  18-Jul-16
22: J199/1          AS   18-Mar-16         NA
23: J199/1          C1   19-Aug-16         NA
24: J199/1          C2   02-Sep-16         NA
25: J199/1          C3   05-Jan-17         NA

R - for every participant, populate subsequent entries with the date of recruitment (clinical data)

Answers (1)

Related Questions