kingp23
kingp23

Reputation: 33

R how to cumulative sums up until condition, including the row where the condition changes

Trying to assign a sequential value for each row up until the condition is met, inclusive of that row. I.e. for a sample such as:

    a b 
FALSE 30.53 
FALSE 27.80
FALSE 26.93
TRUE 41.66
FALSE 2.86
FALSE 16.31
TRUE 40.19

I'd like to create an additional column that counts the values up until a changes to TRUE, inclusive of that row, and then reset back to 0 afterwards, as well as sum up the values in column b, so:

     a b c d
FALSE 30.53 1 30.53
FALSE 27.80 2 58.33
FALSE 26.93 3 85.26
TRUE 41.66 4 126.92
FALSE 2.86 1 2.86
FALSE 16.31 2 19.17
TRUE 40.19 3 59.36

I've tried using

 setDT(temp)[,c:=seq_len(.N), by=.(cumsum(a == TRUE))]

but that results in:

      a b c 
FALSE 30.53 1 
FALSE 27.80 2 
FALSE 26.93 3 
TRUE 41.66 1 
FALSE 2.86 2 
FALSE 16.31 3 
TRUE 40.19 1

Any ideas on a better way to do this

Upvotes: 3

Views: 1520

Answers (2)

akrun
akrun

Reputation: 887148

We can create a grouping column based on the logical column by taking the cumulative sum and getting the lag of that output, then do the cumsum on the column 'b'

library(dplyr)
df1 %>%
    group_by(grp = lag(cumsum(a), default = 0)) %>%
    mutate(c = row_number(), d = cumsum(b)) %>%
    ungroup %>%
    select(-grp)

-output

# A tibble: 7 x 4
#  a         b     c      d
#  <lgl> <dbl> <int>  <dbl>
#1 FALSE 30.5      1  30.5 
#2 FALSE 27.8      2  58.3 
#3 FALSE 26.9      3  85.3 
#4 TRUE  41.7      4 127.  
#5 FALSE  2.86     1   2.86
#6 FALSE 16.3      2  19.2 
#7 TRUE  40.2      3  59.4 
 

Or using data.table with the same logic, grouped by the shift of cumulative sum of 'a', create the 'd' column as the cumsum of 'b',

library(data.table)
setDT(df1)[,  c('c', 'd') := .(1:.N, cumsum(b)), 
   .(grp = shift(cumsum(a), fill = 0))]

data

df1 <- structure(list(a = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, 
TRUE), b = c(30.53, 27.8, 26.93, 41.66, 2.86, 16.31, 40.19)),
class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

Using data.table you can create a group column by using cumsum of a values and for each group create two new columns, one with row number and another with cumulative values of b.

library(data.table)
setDT(temp)[,c('c', 'd') := .(seq_len(.N), cumsum(b)),shift(cumsum(a), fill = 0)]
temp
#       a     b c      d
#1: FALSE 30.53 1  30.53
#2: FALSE 27.80 2  58.33
#3: FALSE 26.93 3  85.26
#4:  TRUE 41.66 4 126.92
#5: FALSE  2.86 1   2.86
#6: FALSE 16.31 2  19.17
#7:  TRUE 40.19 3  59.36

Upvotes: 0

Related Questions