pyne
pyne

Reputation: 507

generate numbers before an event

I would like to generate numbers in a new column that lead to occurrence of an event in another column. What might be the most straight forward way to do this using either R or python?

current data:

var1    var2    event
0.658   72.193  0
0.641   70.217  0
0.641   40.173  0
0.652   52.687  0
0.531   50.652  0
0.529   39.497  1
0.651   29.291  0
0.634   59.548  0
0.711   51.925  0
0.635   75.772  0
0.710   53.378  1
0.660   87.744  0
0.540   62.547  0
0.618   38.050  0
0.602   60.978  1

desired output:

var1    var2    event   event_lead
0.658   72.193  0         -5
0.641   70.217  0         -4
0.641   40.173  0         -3
0.652   52.687  0         -2
0.531   50.652  0         -1
0.529   39.497  1          0
0.651   29.291  0         -4
0.634   59.548  0         -3
0.711   51.925  0         -2
0.635   75.772  0         -1
0.710   53.378  1          0
0.660   87.744  0         -3
0.540   62.547  0         -2
0.618   38.050  0         -1
0.602   60.978  1          0

Upvotes: 0

Views: 76

Answers (1)

akrun
akrun

Reputation: 887251

Using R, we can try with data.table. We create a grouping variable (cumsum(event == 1)), based on that get the reverse sequence, multiply with -1 and assign (:=) it to 'event_lead'. Then, we multiply that output with the logical vector (!event) so that whereever there is 1 in 'event' becomes 0 for the 'event_lead'.

library(data.table)
setDT(df1)[, event_lead:=-(.N:1) ,cumsum(event == 1)
         ][, event_lead := event_lead* (!event)]
df1
#    var1   var2 event event_lead
# 1: 0.658 72.193     0         -5
# 2: 0.641 70.217     0         -4
# 3: 0.641 40.173     0         -3
# 4: 0.652 52.687     0         -2
# 5: 0.531 50.652     0         -1
# 6: 0.529 39.497     1          0
# 7: 0.651 29.291     0         -4
# 8: 0.634 59.548     0         -3
# 9: 0.711 51.925     0         -2
#10: 0.635 75.772     0         -1
#11: 0.710 53.378     1          0
#12: 0.660 87.744     0         -3
#13: 0.540 62.547     0         -2
#14: 0.618 38.050     0         -1
#15: 0.602 60.978     1          0

Or we can use ave from base R

with(df1, ave(event, cumsum(event == 1), FUN = function(x)
                rev(seq_along(x) )* - 1) * (!event))
#[1] -5 -4 -3 -2 -1  0 -4 -3 -2 -1  0 -3 -2 -1  0

Or as @thelatemail mentioned

with(df1, ave(event, rev(cumsum(rev(event))), 
           FUN=function(x) seq_along(x) - length(x)) ) 

Upvotes: 2

Related Questions