Reputation: 103
I have a data.table
like the following:
ID Date Flag_1 Flag_2
1 2016-01-01 1 0
1 2016-02-01 0 0
1 2016-03-01 0 1
1 2016-04-01 0 0
1 2016-05-01 1 0
1 2016-06-01 0 0
1 2016-07-01 0 1
2 2017-03-01 0 0
2 2017-04-01 1 0
2 2017-05-01 0 0
2 2017-06-01 0 0
2 2017-07-01 0 1
I would like to create a Flag_1_date
variable, which links the date of the preceding flag_1 event to the flag_2 event.
It is important to me, that only those flag_1 dates should be linked to the flag_2 event, which has the same ID as the flag_2 event.
ID Date Flag_1 Flag_2 Flag_1_date
1 2016-01-01 1 0 0
1 2016-02-01 0 0 0
1 2016-03-01 0 1 2016-01-01
1 2016-04-01 0 0 0
1 2016-05-01 1 0 0
1 2016-06-01 0 0 0
1 2016-07-01 0 1 2016-05-01
2 2017-03-01 0 0 0
2 2017-04-01 1 0 0
2 2017-05-01 0 0 0
2 2017-06-01 0 0 2017-04-01
2 2017-07-01 0 1 0
Upvotes: 1
Views: 106
Reputation: 66819
I guess the idiomatic way is
DT[Flag_2 == 1, v :=
DT[Flag_1 == 1][.SD, on=.(ID, Date), roll = TRUE, x.Date]
]
ID Date Flag_1 Flag_2 v
1: 1 2016-01-01 1 0 <NA>
2: 1 2016-02-01 0 0 <NA>
3: 1 2016-03-01 0 1 2016-01-01
4: 1 2016-04-01 0 0 <NA>
5: 1 2016-05-01 1 0 <NA>
6: 1 2016-06-01 0 0 <NA>
7: 1 2016-07-01 0 1 2016-05-01
8: 2 2017-03-01 0 0 <NA>
9: 2 2017-04-01 1 0 <NA>
10: 2 2017-05-01 0 0 <NA>
11: 2 2017-06-01 0 0 <NA>
12: 2 2017-07-01 0 1 2017-04-01
How it works
x[i, v := expr]
writes to the v
column on rows selected by i
..SD
is a stand-in for the current Subset of Data, so the expr
is equivalent to
DT[Flag_1 == 1][DT[Flag_2 == 1], on=.(ID, Date), roll = TRUE, x.Date]
This line is another case of x[i, j]
: i = DT[Flag_2 == 1]
is used to select rows in x = DT[Flag_1 == 1]
. The on=
and roll=
arguments pin down how this selection happens. For details, see ?data.table
. The x.*
prefix in j
indicates we want the column from x
, not i
.
Upvotes: 2
Reputation: 24178
You could use na.locf()
from the zoo
package to forward fill Date
when Flag_1==1
by "ID"
, and subsequently omit the values where Flag_2==0
:
library(zoo)
library(data.table)
dt[Flag_1==1, Flag_1_date := Date][
,Flag_1_date := na.locf(Flag_1_date), "ID"][
Flag_2==0, Flag_1_date := NA
]
> dt
# ID Date Flag_1 Flag_2 Flag_1_date
# 1: 1 2016-01-01 1 0 <NA>
# 2: 1 2016-02-01 0 0 <NA>
# 3: 1 2016-03-01 0 1 2016-01-01
# 4: 1 2016-04-01 0 0 <NA>
# 5: 1 2016-05-01 1 0 <NA>
# 6: 1 2016-06-01 0 0 <NA>
# 7: 1 2016-07-01 0 1 2016-05-01
# 8: 2 2017-03-01 0 0 <NA>
# 9: 2 2017-04-01 1 0 <NA>
#10: 2 2017-05-01 0 0 <NA>
#11: 2 2017-06-01 0 0 <NA>
#12: 2 2017-07-01 0 1 2017-04-01
Upvotes: 3