Jantje Houten
Jantje Houten

Reputation: 151

How to use cumsum but with condition if <0 then use 0

I am having problems on 2 things but the problem with conditional cumsum is more important (and I guess less trivial). I would like to subtract 1 from the cumsum each row and if the result is <0, then do not use the negative number, but use 0 henceforth.

(1) cumsum not taking and using 0 when <0
(2) operate only on 2nd row onwards

Here is the sample table:

a <- data.table(
    id1 = c(1,1,1,1,1,1,1,1,1,1),
    id2 = c(1,2,3,4,5,6,7,8,9,10),
    date = c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-05", "2020-01-06","2020-01-07","2020-01-08","2020-01-09","2020-01-10"),
    DISPENSED_DURATION = c(4,0,0,0,0,0,0,4,0,0)
)

Here is the expected result:

    id1 id2       date DISPENSED_DURATION DISP_INVT
 1:   1   1 2020-01-01                  4         4
 2:   1   2 2020-01-02                  0         3
 3:   1   3 2020-01-03                  0         2
 4:   1   4 2020-01-04                  0         1
 5:   1   5 2020-01-05                  0         0
 6:   1   6 2020-01-06                  0         0
 7:   1   7 2020-01-07                  0         0
 8:   1   8 2020-01-08                  4         3
 9:   1   9 2020-01-09                  0         2
10:   1   10 2020-01-10                 0         1

Here is the current codes tried and results obtained:

    a[, DISP_INVT := cumsum(DISPENSED_DURATION-1), id1]

    id1 id2       date DISPENSED_DURATION DISP_INVT
 1:   1   1 2020-01-01                  4         3
 2:   1   2 2020-01-02                  0         2
 3:   1   3 2020-01-03                  0         1
 4:   1   4 2020-01-04                  0         0
 5:   1   5 2020-01-05                  0        -1
 6:   1   6 2020-01-06                  0        -2
 7:   1   7 2020-01-07                  0        -3
 8:   1   8 2020-01-08                  4         0
 9:   1   9 2020-01-09                  0        -1
10:   1   10 2020-01-10                 0        -2
    a[, DISP_INVT := ifelse(cumsum(DISPENSED_DURATION-1)<0,0,cumsum(DISPENSED_DURATION-1)), id1]

    id1 id2       date DISPENSED_DURATION DISP_INVT
 1:   1   1 2020-01-01                  4         3
 2:   1   2 2020-01-02                  0         2
 3:   1   3 2020-01-03                  0         1
 4:   1   4 2020-01-04                  0         0
 5:   1   5 2020-01-05                  0         0
 6:   1   6 2020-01-06                  0         0
 7:   1   7 2020-01-07                  0         0
 8:   1   8 2020-01-08                  4         0
 9:   1   9 2020-01-09                  0         0
10:   1   10 2020-01-10                 0         0

    a[, DISP_INVT := cumsum(ifelse(DISPENSED_DURATION-1<0,0,DISPENSED_DURATION-1)), id1]

    id1 id2       date DISPENSED_DURATION DISP_INVT
 1:   1   1 2020-01-01                  4         3
 2:   1   2 2020-01-02                  0         3
 3:   1   3 2020-01-03                  0         3
 4:   1   4 2020-01-04                  0         3
 5:   1   5 2020-01-05                  0         3
 6:   1   6 2020-01-06                  0         3
 7:   1   7 2020-01-07                  0         3
 8:   1   8 2020-01-08                  4         6
 9:   1   9 2020-01-09                  0         6
10:   1   10 2020-01-10                 0         6

Thanks alot!

Upvotes: 3

Views: 107

Answers (2)

WilliamGram
WilliamGram

Reputation: 683

A quick and dirty approach would be to group the data:

a %>% 
  mutate(
    lag_disp = lag(a$DISPENSED_DURATION, default=-1),
    change = DISPENSED_DURATION != lag_disp,
    group = cumsum(as.integer(DISPENSED_DURATION & change))
  ) %>% 
  group_by(group) %>% 
  mutate(DISP_INVT = ifelse(
    group == 1,
    pmax(1+cumsum(DISPENSED_DURATION-1), 0),
    pmax(cumsum(DISPENSED_DURATION-1), 0)
  )) %>% 
  ungroup() %>% 
  select(-c(lag_disp, change, group))

#      id1   id2 date       DISPENSED_DURATION DISP_INVT
#    <dbl> <dbl> <chr>                   <dbl>     <dbl>
#  1     1     1 2020-01-01                  4         4
#  2     1     2 2020-01-02                  0         3
#  3     1     3 2020-01-03                  0         2
#  4     1     4 2020-01-04                  0         1
#  5     1     5 2020-01-05                  0         0
#  6     1     6 2020-01-06                  0         0
#  7     1     7 2020-01-07                  0         0
#  8     1     8 2020-01-08                  4         3
#  9     1     9 2020-01-09                  0         2
# 10     1    10 2020-01-10                  0         1

I'm sure there exists a more concise approach, but this should work.

Upvotes: 1

r2evans
r2evans

Reputation: 160397

You need a reducing/iterative calculation here instead of a cumulative/vector, because the value in one row depends on the calculated value of the previous.

a[, DISP_INVT := Reduce(function(prev,this) max(0, prev+this-1), 
                        DISPENSED_DURATION, accumulate = TRUE)]
a
#       id1   id2       date DISPENSED_DURATION DISP_INVT
#     <num> <num>     <char>              <num>     <num>
#  1:     1     1 2020-01-01                  4         4
#  2:     1     2 2020-01-02                  0         3
#  3:     1     3 2020-01-03                  0         2
#  4:     1     4 2020-01-04                  0         1
#  5:     1     5 2020-01-05                  0         0
#  6:     1     6 2020-01-06                  0         0
#  7:     1     7 2020-01-07                  0         0
#  8:     1     8 2020-01-08                  4         3
#  9:     1     9 2020-01-09                  0         2
# 10:     1    10 2020-01-10                  0         1

Upvotes: 4

Related Questions