Reputation: 507
I would like to generate numbers in a new column that lead to occurrence of an event in another column. What might be the most straight forward way to do this using either R or python?
current data:
var1 var2 event
0.658 72.193 0
0.641 70.217 0
0.641 40.173 0
0.652 52.687 0
0.531 50.652 0
0.529 39.497 1
0.651 29.291 0
0.634 59.548 0
0.711 51.925 0
0.635 75.772 0
0.710 53.378 1
0.660 87.744 0
0.540 62.547 0
0.618 38.050 0
0.602 60.978 1
desired output:
var1 var2 event event_lead
0.658 72.193 0 -5
0.641 70.217 0 -4
0.641 40.173 0 -3
0.652 52.687 0 -2
0.531 50.652 0 -1
0.529 39.497 1 0
0.651 29.291 0 -4
0.634 59.548 0 -3
0.711 51.925 0 -2
0.635 75.772 0 -1
0.710 53.378 1 0
0.660 87.744 0 -3
0.540 62.547 0 -2
0.618 38.050 0 -1
0.602 60.978 1 0
Upvotes: 0
Views: 76
Reputation: 887251
Using R
, we can try with data.table
. We create a grouping variable (cumsum(event == 1)
), based on that get the reverse sequence, multiply with -1
and assign (:=
) it to 'event_lead'. Then, we multiply that output with the logical vector (!event
) so that whereever there is 1 in 'event' becomes 0 for the 'event_lead'.
library(data.table)
setDT(df1)[, event_lead:=-(.N:1) ,cumsum(event == 1)
][, event_lead := event_lead* (!event)]
df1
# var1 var2 event event_lead
# 1: 0.658 72.193 0 -5
# 2: 0.641 70.217 0 -4
# 3: 0.641 40.173 0 -3
# 4: 0.652 52.687 0 -2
# 5: 0.531 50.652 0 -1
# 6: 0.529 39.497 1 0
# 7: 0.651 29.291 0 -4
# 8: 0.634 59.548 0 -3
# 9: 0.711 51.925 0 -2
#10: 0.635 75.772 0 -1
#11: 0.710 53.378 1 0
#12: 0.660 87.744 0 -3
#13: 0.540 62.547 0 -2
#14: 0.618 38.050 0 -1
#15: 0.602 60.978 1 0
Or we can use ave
from base R
with(df1, ave(event, cumsum(event == 1), FUN = function(x)
rev(seq_along(x) )* - 1) * (!event))
#[1] -5 -4 -3 -2 -1 0 -4 -3 -2 -1 0 -3 -2 -1 0
Or as @thelatemail mentioned
with(df1, ave(event, rev(cumsum(rev(event))),
FUN=function(x) seq_along(x) - length(x)) )
Upvotes: 2