Reputation: 151
I am having problems on 2 things but the problem with conditional cumsum is more important (and I guess less trivial). I would like to subtract 1 from the cumsum each row and if the result is <0, then do not use the negative number, but use 0 henceforth.
(1) cumsum not taking and using 0 when <0
(2) operate only on 2nd row onwards
Here is the sample table:
a <- data.table(
id1 = c(1,1,1,1,1,1,1,1,1,1),
id2 = c(1,2,3,4,5,6,7,8,9,10),
date = c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-05", "2020-01-06","2020-01-07","2020-01-08","2020-01-09","2020-01-10"),
DISPENSED_DURATION = c(4,0,0,0,0,0,0,4,0,0)
)
Here is the expected result:
id1 id2 date DISPENSED_DURATION DISP_INVT
1: 1 1 2020-01-01 4 4
2: 1 2 2020-01-02 0 3
3: 1 3 2020-01-03 0 2
4: 1 4 2020-01-04 0 1
5: 1 5 2020-01-05 0 0
6: 1 6 2020-01-06 0 0
7: 1 7 2020-01-07 0 0
8: 1 8 2020-01-08 4 3
9: 1 9 2020-01-09 0 2
10: 1 10 2020-01-10 0 1
Here is the current codes tried and results obtained:
a[, DISP_INVT := cumsum(DISPENSED_DURATION-1), id1]
id1 id2 date DISPENSED_DURATION DISP_INVT
1: 1 1 2020-01-01 4 3
2: 1 2 2020-01-02 0 2
3: 1 3 2020-01-03 0 1
4: 1 4 2020-01-04 0 0
5: 1 5 2020-01-05 0 -1
6: 1 6 2020-01-06 0 -2
7: 1 7 2020-01-07 0 -3
8: 1 8 2020-01-08 4 0
9: 1 9 2020-01-09 0 -1
10: 1 10 2020-01-10 0 -2
a[, DISP_INVT := ifelse(cumsum(DISPENSED_DURATION-1)<0,0,cumsum(DISPENSED_DURATION-1)), id1]
id1 id2 date DISPENSED_DURATION DISP_INVT
1: 1 1 2020-01-01 4 3
2: 1 2 2020-01-02 0 2
3: 1 3 2020-01-03 0 1
4: 1 4 2020-01-04 0 0
5: 1 5 2020-01-05 0 0
6: 1 6 2020-01-06 0 0
7: 1 7 2020-01-07 0 0
8: 1 8 2020-01-08 4 0
9: 1 9 2020-01-09 0 0
10: 1 10 2020-01-10 0 0
a[, DISP_INVT := cumsum(ifelse(DISPENSED_DURATION-1<0,0,DISPENSED_DURATION-1)), id1]
id1 id2 date DISPENSED_DURATION DISP_INVT
1: 1 1 2020-01-01 4 3
2: 1 2 2020-01-02 0 3
3: 1 3 2020-01-03 0 3
4: 1 4 2020-01-04 0 3
5: 1 5 2020-01-05 0 3
6: 1 6 2020-01-06 0 3
7: 1 7 2020-01-07 0 3
8: 1 8 2020-01-08 4 6
9: 1 9 2020-01-09 0 6
10: 1 10 2020-01-10 0 6
Thanks alot!
Upvotes: 3
Views: 107
Reputation: 683
A quick and dirty approach would be to group the data:
a %>%
mutate(
lag_disp = lag(a$DISPENSED_DURATION, default=-1),
change = DISPENSED_DURATION != lag_disp,
group = cumsum(as.integer(DISPENSED_DURATION & change))
) %>%
group_by(group) %>%
mutate(DISP_INVT = ifelse(
group == 1,
pmax(1+cumsum(DISPENSED_DURATION-1), 0),
pmax(cumsum(DISPENSED_DURATION-1), 0)
)) %>%
ungroup() %>%
select(-c(lag_disp, change, group))
# id1 id2 date DISPENSED_DURATION DISP_INVT
# <dbl> <dbl> <chr> <dbl> <dbl>
# 1 1 1 2020-01-01 4 4
# 2 1 2 2020-01-02 0 3
# 3 1 3 2020-01-03 0 2
# 4 1 4 2020-01-04 0 1
# 5 1 5 2020-01-05 0 0
# 6 1 6 2020-01-06 0 0
# 7 1 7 2020-01-07 0 0
# 8 1 8 2020-01-08 4 3
# 9 1 9 2020-01-09 0 2
# 10 1 10 2020-01-10 0 1
I'm sure there exists a more concise approach, but this should work.
Upvotes: 1
Reputation: 160397
You need a reducing/iterative calculation here instead of a cumulative/vector, because the value in one row depends on the calculated value of the previous.
a[, DISP_INVT := Reduce(function(prev,this) max(0, prev+this-1),
DISPENSED_DURATION, accumulate = TRUE)]
a
# id1 id2 date DISPENSED_DURATION DISP_INVT
# <num> <num> <char> <num> <num>
# 1: 1 1 2020-01-01 4 4
# 2: 1 2 2020-01-02 0 3
# 3: 1 3 2020-01-03 0 2
# 4: 1 4 2020-01-04 0 1
# 5: 1 5 2020-01-05 0 0
# 6: 1 6 2020-01-06 0 0
# 7: 1 7 2020-01-07 0 0
# 8: 1 8 2020-01-08 4 3
# 9: 1 9 2020-01-09 0 2
# 10: 1 10 2020-01-10 0 1
Upvotes: 4