Reputation: 1102
I have a dataset, with users and sequential events and non-events inbetween.
DT = data.table(user = c("1001","1001","1001","1001","1001","1001",
"1002","1002","1002","1002"),
event = c(NA,"e1",NA,NA,NA,"e2",
"e1",NA,NA,"e2"))
I want to be able to count the rows(non-events) before an event occurs by user group. Expected result:
user event rows.before.event
1: 1001 NA NA
2: 1001 e1 1
3: 1001 NA NA
4: 1001 NA NA
5: 1001 NA NA
6: 1001 e2 3
7: 1002 e1 0
8: 1002 NA NA
9: 1002 NA NA
10: 1002 e2 2
Have tried with rleid()
but no success. Any suggestions welcomed.
Upvotes: 3
Views: 198
Reputation: 49448
DT[, count := .N-1, by = .(user, rev(cumsum(rev(!is.na(event)))))][
is.na(event), count := NA]
# user event count
# 1: 1001 NA NA
# 2: 1001 e1 1
# 3: 1001 NA NA
# 4: 1001 NA NA
# 5: 1001 NA NA
# 6: 1001 e2 3
# 7: 1002 e1 0
# 8: 1002 NA NA
# 9: 1002 NA NA
#10: 1002 e2 2
Upvotes: 8
Reputation: 14370
In case you wanted yet another way of achieving the same goal:
library(zoo)
DT$group <- rev(na.locf(rev(DT$event)))
DT[, rowsbefore := lapply(.SD,function(x) {sum(is.na(x))}) , by = .(user,group)]
DT$rowsbefore <- ifelse(is.na(DT$event),NA,DT$rowsbefore)
> DT
user event group rowsbefore
1: 1001 NA e1 NA
2: 1001 e1 e1 1
3: 1001 NA e2 NA
4: 1001 NA e2 NA
5: 1001 NA e2 NA
6: 1001 e2 e2 3
7: 1002 e1 e1 0
8: 1002 NA e2 NA
9: 1002 NA e2 NA
10: 1002 e2 e2 2
If you don't want to replace the NA's and want the group sums, you can just leave out the last line.
Edit - Per @ Procrastinatus Maximus' comment below, a better way of doing my same solution:
DT[, rowsbefore := sum(is.na(event)), by = .(user, rev(na.locf(rev(event))))
][is.na(event), rowsbefore := NA]
Upvotes: 3
Reputation: 83275
A solution with rleid
and shift
:
DT[, before := .N, by = .(user, rleid(is.na(event)))
][, before := shift(before, fill = 0), by = user
][is.na(event), before := NA][]
which gives:
user event before
1: 1001 NA NA
2: 1001 e1 1
3: 1001 NA NA
4: 1001 NA NA
5: 1001 NA NA
6: 1001 e2 3
7: 1002 e1 0
8: 1002 NA NA
9: 1002 NA NA
10: 1002 e2 2
Upvotes: 6
Reputation: 17432
> DT[, rows.before.event:= ifelse(is.na(event), NA, .N - 1) ,by = list(user, c(0, cumsum(!is.na(event))[-length(event)]))]
> DT
user event rows.before.event
1: 1001 NA NA
2: 1001 e1 1
3: 1001 NA NA
4: 1001 NA NA
5: 1001 NA NA
6: 1001 e2 3
7: 1002 e1 0
8: 1002 NA NA
9: 1002 NA NA
10: 1002 e2 2
Upvotes: 4