Reputation: 225
I am having trouble the data below:
myDF <- structure(list(id = c("J13/5", "J13/5", "J13/5", "J13/5", "J13/5",
"J138/2", "J138/2", "J138/2", "J140/6", "J140/6", "J140/6", "J155/9",
"J155/9", "J177/4", "J193/2", "J197/5", "J197/5", "J197/5", "J197/5",
"J197/5", "J197/5", "J199/1", "J199/1", "J199/1", "J199/1", "J204/3",
"J230/1", "J25/5", "J25/9", "J25/9", "J25/9", "J259/2", "J261/2",
"J289/1", "J293/4", "J293/4", "J293/4", "J293/4", "J293/4", "J293/4",
"J293/4", "J298/7", "J298/7", "J298/7", "J298/7", "J30/8", "J30/8",
"J30/8", "J306/4", "J312/4"),
sample_type = c("A", "C1", "C2",
"C3", "A", "A1", "C1", "C2", "A", "C1", "C2", "A", "A", "A",
"C2", "AS", "A", "C1", "C2", "C3", "AS", "AS", "C1", "C2", "C3",
"A", "A", "C3", "A", "C1", "C2", "AS", "A", "A", "A", "A1", "AC1",
"C1", "C2", "A", "C3", "A", "C1", "C2", "AC3", "A1", "C1", "C2",
"A", "A"),
sample_date = c("09-Sep-16", "06-Oct-16", "21-Oct-16",
"22-Feb-17", "03-Jul-17", "08-Aug-16", "05-Sep-16", "19-Sep-16",
"31-Aug-16", "29-Sep-16", "10-Oct-16", "26-Jan-16", "15-Sep-17",
"25-Jan-16", "19-Sep-16", "22-Mar-16", "18-Jul-16", "15-Aug-16",
"30-Aug-16", "05-Jan-17", "03-Apr-17", "18-Mar-16", "19-Aug-16",
"02-Sep-16", "05-Jan-17", "03-Apr-17", "21-Mar-16", "15-Mar-17",
"30-Sep-16", "18-Oct-16", "10-Nov-16", "18-Mar-16", "04-Jul-17",
"04-Sep-17", "28-Jan-16", "21-Jul-16", "05-Aug-16", "18-Aug-16",
"31-Aug-16", "15-Sep-16", "04-Jan-17", "11-Jul-16", "08-Aug-16",
"22-Aug-16", "27-Sep-16", "28-Jul-16", "24-Aug-16", "07-Sep-16",
"03-Feb-16", "20-Jan-17")),
.Names = c("id", "sample_type", "sample_date"
), row.names = c(NA, -50L), class = c("data.table", "data.frame"
))
This data comes from a clinical trial where over a two year period, children are recruited with an acute episode (sample type = A) and are followed up over three subsequent timepoints (sample type = C1 or C2 or C3). All these have been defined differently depending on what a child presents with. Anything that falls outside means that the child presented with something else.
One could, therefore, say that a complete follow-up starts with A and ends with C3 and when a child completes this follow-up, they start all over again.
I want to add another variable (date_acute) that for every follow-up, is populated with the date when the child presented with an acute episode. For example, picking a few of the children, we expect the output to be as below:
id sample_type sample_date date_acute
child13/5 A 09-Sep-16 09-Sep-16
child13/5 C1 06-Oct-16 09-Sep-16
child13/5 C2 21-Oct-16 09-Sep-16
child13/5 C3 22-Feb-17 09-Sep-16
child13/5 A 03-Jul-17 03-Jul-17
child138/2 A1 08-Aug-16 NA
child138/2 C1 05-Sep-16 NA
child138/2 C2 19-Sep-16 NA
child140/6 A 31-Aug-16 31-Aug-16
child140/6 C1 29-Sep-16 31-Aug-16
child140/6 C2 10-Oct-16 31-Aug-16
child155/9 A 26-Jan-16 26-Jan-16
child155/9 A 15-Sep-17 15-Sep-17
child177/4 A 25-Jan-16 25-Jan-16
Upvotes: 1
Views: 58
Reputation: 3062
One approach could be the following:
library(data.table)
setDT(yDF)
yDF[, date_acute := sample_date[sample_type == "A"], by = .(id, cumsum(sample_type == "A" ))]
> yDF
id sample_type sample_date date_acute
1: J13/5 A 09-Sep-16 09-Sep-16
2: J13/5 C1 06-Oct-16 09-Sep-16
3: J13/5 C2 21-Oct-16 09-Sep-16
4: J13/5 C3 22-Feb-17 09-Sep-16
5: J13/5 A 03-Jul-17 03-Jul-17
6: J138/2 A1 08-Aug-16 NA
7: J138/2 C1 05-Sep-16 NA
8: J138/2 C2 19-Sep-16 NA
9: J140/6 A 31-Aug-16 31-Aug-16
10: J140/6 C1 29-Sep-16 31-Aug-16
11: J140/6 C2 10-Oct-16 31-Aug-16
12: J155/9 A 26-Jan-16 26-Jan-16
13: J155/9 A 15-Sep-17 15-Sep-17
14: J177/4 A 25-Jan-16 25-Jan-16
15: J193/2 C2 19-Sep-16 NA
16: J197/5 AS 22-Mar-16 NA
17: J197/5 A 18-Jul-16 18-Jul-16
18: J197/5 C1 15-Aug-16 18-Jul-16
19: J197/5 C2 30-Aug-16 18-Jul-16
20: J197/5 C3 05-Jan-17 18-Jul-16
21: J197/5 AS 03-Apr-17 18-Jul-16
22: J199/1 AS 18-Mar-16 NA
23: J199/1 C1 19-Aug-16 NA
24: J199/1 C2 02-Sep-16 NA
25: J199/1 C3 05-Jan-17 NA
Upvotes: 2