Mártonka
Mártonka

Reputation: 103

Defining closest observation in a data table

I have a data.table like the following:

    ID Date        Flag_1    Flag_2
    1  2016-01-01  1         0
    1  2016-02-01  0         0
    1  2016-03-01  0         1
    1  2016-04-01  0         0
    1  2016-05-01  1         0
    1  2016-06-01  0         0
    1  2016-07-01  0         1
    2  2017-03-01  0         0
    2  2017-04-01  1         0
    2  2017-05-01  0         0
    2  2017-06-01  0         0
    2  2017-07-01  0         1

I would like to create a Flag_1_date variable, which links the date of the preceding flag_1 event to the flag_2 event.

It is important to me, that only those flag_1 dates should be linked to the flag_2 event, which has the same ID as the flag_2 event.

    ID Date        Flag_1    Flag_2  Flag_1_date
    1  2016-01-01  1         0       0
    1  2016-02-01  0         0       0
    1  2016-03-01  0         1       2016-01-01
    1  2016-04-01  0         0       0
    1  2016-05-01  1         0       0
    1  2016-06-01  0         0       0
    1  2016-07-01  0         1       2016-05-01
    2  2017-03-01  0         0       0
    2  2017-04-01  1         0       0
    2  2017-05-01  0         0       0
    2  2017-06-01  0         0       2017-04-01
    2  2017-07-01  0         1       0

Upvotes: 1

Views: 106

Answers (2)

Frank
Frank

Reputation: 66819

I guess the idiomatic way is

DT[Flag_2 == 1, v := 
  DT[Flag_1 == 1][.SD, on=.(ID, Date), roll = TRUE, x.Date]
]

    ID       Date Flag_1 Flag_2          v
 1:  1 2016-01-01      1      0       <NA>
 2:  1 2016-02-01      0      0       <NA>
 3:  1 2016-03-01      0      1 2016-01-01
 4:  1 2016-04-01      0      0       <NA>
 5:  1 2016-05-01      1      0       <NA>
 6:  1 2016-06-01      0      0       <NA>
 7:  1 2016-07-01      0      1 2016-05-01
 8:  2 2017-03-01      0      0       <NA>
 9:  2 2017-04-01      1      0       <NA>
10:  2 2017-05-01      0      0       <NA>
11:  2 2017-06-01      0      0       <NA>
12:  2 2017-07-01      0      1 2017-04-01

How it works

  • x[i, v := expr] writes to the v column on rows selected by i.
  • .SD is a stand-in for the current Subset of Data, so the expr is equivalent to

    DT[Flag_1 == 1][DT[Flag_2 == 1], on=.(ID, Date), roll = TRUE, x.Date]
    
  • This line is another case of x[i, j]: i = DT[Flag_2 == 1] is used to select rows in x = DT[Flag_1 == 1]. The on= and roll= arguments pin down how this selection happens. For details, see ?data.table. The x.* prefix in j indicates we want the column from x, not i.

Upvotes: 2

mtoto
mtoto

Reputation: 24178

You could use na.locf() from the zoo package to forward fill Date when Flag_1==1 by "ID", and subsequently omit the values where Flag_2==0:

library(zoo)
library(data.table)

dt[Flag_1==1, Flag_1_date := Date][
  ,Flag_1_date := na.locf(Flag_1_date), "ID"][
   Flag_2==0, Flag_1_date := NA
  ]
> dt
#   ID       Date Flag_1 Flag_2 Flag_1_date
# 1:  1 2016-01-01      1      0        <NA>
# 2:  1 2016-02-01      0      0        <NA>
# 3:  1 2016-03-01      0      1  2016-01-01
# 4:  1 2016-04-01      0      0        <NA>
# 5:  1 2016-05-01      1      0        <NA>
# 6:  1 2016-06-01      0      0        <NA>
# 7:  1 2016-07-01      0      1  2016-05-01
# 8:  2 2017-03-01      0      0        <NA>
# 9:  2 2017-04-01      1      0        <NA>
#10:  2 2017-05-01      0      0        <NA>
#11:  2 2017-06-01      0      0        <NA>
#12:  2 2017-07-01      0      1  2017-04-01

Upvotes: 3

Related Questions