Serkan
Serkan

Reputation: 1945

Subset data.table conditionally by group

I have the following data.table,

   id time event
1:  1    1     1
2:  1    2     0
3:  2    1     0
4:  2    2     1

# Generate data;
library(data.table)

data <- data.table(
    id = c(1,1,2,2),
    time = c(1,2,1,2),
    event = c(1,0,0,1)
)

I want to retain all rows up until event == 1, such that the data looks like

   id time event
1:  1    1     1
2:  2    1     0
3:  2    2     1

I'm looking for a data.table solution.

Upvotes: 0

Views: 81

Answers (1)

r2evans
r2evans

Reputation: 160407

data[, .SD[cumsum(shift(event, type = "lag", fill = 0) > 0) < 1,], by = id]
#       id  time event
#    <num> <num> <num>
# 1:     1     1     1
# 2:     2     1     0
# 3:     2     2     1

The use of cumsum(.) < 1 is effectively "keep until the result is true" (since sum of true is 1+, so cumulative sum of all-false is still 0). However, since we need to keep the current row when we first see event == 1, we need to work on the lagged value using shift(event, type="lag"). The default of shift in this case for the first row of each group is NA which is not going to work, so we use fill=0 to mimic having a non-1 preceding event.

Upvotes: 1

Related Questions