Reputation: 16004
I have a data.table that looks like this
ID, Order, Segment
1, 1, A
1, 2, B
1, 3, B
1, 4, C
1, 5, B
1, 6, B
1, 7, B
1, 8, B
Basically by ordering the data using the Order column. I would like to understand the number of consecutive B's for each of the ID's. Ideally the output I would like is
ID, Consec
1, 2
1, 4
Because the segment B appears consecutively in row 2 and 3 (2 times), and then again in row 5,6,7,8 (4 times).
The loop solution is quite obvious but would also be very slow.
Are there elegant solutions in data.table that is also fast?
P.S. The data I am dealing with has ~20 million rows.
Upvotes: 5
Views: 357
Reputation: 886938
Try
library(data.table)#v1.9.5+
DT[order(ID, Order)][, indx:=rleid(Segment)][Segment=='B',
list(Consec=.N), by = list(indx, ID)][,indx:=NULL][]
# ID Consec
#1: 1 2
#2: 1 4
Or as @eddi suggested
DT[order(ID, Order)][, .(Consec = .N), by = .(ID, Segment,
rleid(Segment))][Segment == 'B', .(ID, Consec)]
# ID Consec
#1: 1 2
#2: 1 4
A more memory efficient method would be to use setorder
instead of order
(as suggested by @Arun)
setorder(DT, ID, Order)[, .(Consec = .N), by = .(ID, Segment,
rleid(Segment))][Segment == 'B', .(ID, Consec)]
# ID Consec
#1: 1 2
#2: 1 4
Upvotes: 10