iboboboru
iboboboru

Reputation: 1102

Count rows before event - data.table

I have a dataset, with users and sequential events and non-events inbetween.

DT = data.table(user = c("1001","1001","1001","1001","1001","1001",
                          "1002","1002","1002","1002"), 
               event = c(NA,"e1",NA,NA,NA,"e2",
                           "e1",NA,NA,"e2"))

I want to be able to count the rows(non-events) before an event occurs by user group. Expected result:

   user  event  rows.before.event
 1: 1001    NA                 NA
 2: 1001    e1                  1
 3: 1001    NA                 NA
 4: 1001    NA                 NA
 5: 1001    NA                 NA
 6: 1001    e2                  3
 7: 1002    e1                  0
 8: 1002    NA                 NA
 9: 1002    NA                 NA
10: 1002    e2                  2

Have tried with rleid() but no success. Any suggestions welcomed.

Upvotes: 3

Views: 198

Answers (4)

eddi
eddi

Reputation: 49448

DT[, count := .N-1, by = .(user, rev(cumsum(rev(!is.na(event)))))][
   is.na(event), count := NA]
#    user event count
# 1: 1001    NA    NA
# 2: 1001    e1     1
# 3: 1001    NA    NA
# 4: 1001    NA    NA
# 5: 1001    NA    NA
# 6: 1001    e2     3
# 7: 1002    e1     0
# 8: 1002    NA    NA
# 9: 1002    NA    NA
#10: 1002    e2     2

Upvotes: 8

Mike H.
Mike H.

Reputation: 14370

In case you wanted yet another way of achieving the same goal:

library(zoo)
DT$group <- rev(na.locf(rev(DT$event))) 
DT[, rowsbefore := lapply(.SD,function(x) {sum(is.na(x))}) , by = .(user,group)]
DT$rowsbefore <- ifelse(is.na(DT$event),NA,DT$rowsbefore)

> DT
    user event group rowsbefore
 1: 1001    NA    e1         NA
 2: 1001    e1    e1          1
 3: 1001    NA    e2         NA
 4: 1001    NA    e2         NA
 5: 1001    NA    e2         NA
 6: 1001    e2    e2          3
 7: 1002    e1    e1          0
 8: 1002    NA    e2         NA
 9: 1002    NA    e2         NA
10: 1002    e2    e2          2

If you don't want to replace the NA's and want the group sums, you can just leave out the last line.

Edit - Per @ Procrastinatus Maximus' comment below, a better way of doing my same solution:

DT[, rowsbefore := sum(is.na(event)), by = .(user, rev(na.locf(rev(event))))
   ][is.na(event), rowsbefore := NA]

Upvotes: 3

Jaap
Jaap

Reputation: 83275

A solution with rleid and shift:

DT[, before := .N, by = .(user, rleid(is.na(event)))
   ][, before := shift(before, fill = 0), by = user
     ][is.na(event), before := NA][]

which gives:

    user event before
 1: 1001    NA     NA
 2: 1001    e1      1
 3: 1001    NA     NA
 4: 1001    NA     NA
 5: 1001    NA     NA
 6: 1001    e2      3
 7: 1002    e1      0
 8: 1002    NA     NA
 9: 1002    NA     NA
10: 1002    e2      2

Upvotes: 6

Se&#241;or O
Se&#241;or O

Reputation: 17432

> DT[, rows.before.event:=  ifelse(is.na(event), NA, .N - 1) ,by = list(user, c(0, cumsum(!is.na(event))[-length(event)]))]
> DT
    user event rows.before.event
 1: 1001    NA                NA
 2: 1001    e1                 1
 3: 1001    NA                NA
 4: 1001    NA                NA
 5: 1001    NA                NA
 6: 1001    e2                 3
 7: 1002    e1                 0
 8: 1002    NA                NA
 9: 1002    NA                NA
10: 1002    e2                 2

Upvotes: 4

Related Questions