Reputation: 33
Trying to assign a sequential value for each row up until the condition is met, inclusive of that row. I.e. for a sample such as:
a b
FALSE 30.53
FALSE 27.80
FALSE 26.93
TRUE 41.66
FALSE 2.86
FALSE 16.31
TRUE 40.19
I'd like to create an additional column that counts the values up until a changes to TRUE, inclusive of that row, and then reset back to 0 afterwards, as well as sum up the values in column b, so:
a b c d
FALSE 30.53 1 30.53
FALSE 27.80 2 58.33
FALSE 26.93 3 85.26
TRUE 41.66 4 126.92
FALSE 2.86 1 2.86
FALSE 16.31 2 19.17
TRUE 40.19 3 59.36
I've tried using
setDT(temp)[,c:=seq_len(.N), by=.(cumsum(a == TRUE))]
but that results in:
a b c
FALSE 30.53 1
FALSE 27.80 2
FALSE 26.93 3
TRUE 41.66 1
FALSE 2.86 2
FALSE 16.31 3
TRUE 40.19 1
Any ideas on a better way to do this
Upvotes: 3
Views: 1520
Reputation: 887148
We can create a grouping column based on the logical column by taking the cumulative sum and getting the lag
of that output, then do the cumsum
on the column 'b'
library(dplyr)
df1 %>%
group_by(grp = lag(cumsum(a), default = 0)) %>%
mutate(c = row_number(), d = cumsum(b)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 7 x 4
# a b c d
# <lgl> <dbl> <int> <dbl>
#1 FALSE 30.5 1 30.5
#2 FALSE 27.8 2 58.3
#3 FALSE 26.9 3 85.3
#4 TRUE 41.7 4 127.
#5 FALSE 2.86 1 2.86
#6 FALSE 16.3 2 19.2
#7 TRUE 40.2 3 59.4
Or using data.table
with the same logic, grouped by the shift
of cumulative sum of 'a', create the 'd' column as the cumsum
of 'b',
library(data.table)
setDT(df1)[, c('c', 'd') := .(1:.N, cumsum(b)),
.(grp = shift(cumsum(a), fill = 0))]
df1 <- structure(list(a = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE,
TRUE), b = c(30.53, 27.8, 26.93, 41.66, 2.86, 16.31, 40.19)),
class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 2
Reputation: 388982
Using data.table
you can create a group column by using cumsum
of a
values and for each group create two new columns, one with row number and another with cumulative values of b
.
library(data.table)
setDT(temp)[,c('c', 'd') := .(seq_len(.N), cumsum(b)),shift(cumsum(a), fill = 0)]
temp
# a b c d
#1: FALSE 30.53 1 30.53
#2: FALSE 27.80 2 58.33
#3: FALSE 26.93 3 85.26
#4: TRUE 41.66 4 126.92
#5: FALSE 2.86 1 2.86
#6: FALSE 16.31 2 19.17
#7: TRUE 40.19 3 59.36
Upvotes: 0