Mogens
Mogens

Reputation: 11

bounded cumulative sum with variable upper bound

I want to do a cumulative sum (likewise cumsum()) with an lower and upper bound so that the summation never goes below a lower or above an upper bound. However, the upper boundary should be a variable value given as a vector to the function, and should not be a fixed value as in this post link.

I tried to use the cumsum.bounded function proposed by @G. Grothendieck of the above mentioned function, however it only takes one upper bound.

Here is some example data:

test_data <- tibble(prec_et = c(1,2,-1,-3,5,4,5,4,3,-3,-2,4,3,4),
                    upper_bound = c(10,10,10,10,10,10,10,13,14,15,16,17,18,19))

and my expected output of the function should be like # [1] 1 3 2 0 5 9 10 13 14 11 9 13 16 19 unsing 0 as lower bound and the upper_bound column as variable upper bound.

or here the same as snip of an excel table: enter image description here

Thanks for any help on this topic!

Upvotes: 1

Views: 94

Answers (3)

s_baldur
s_baldur

Reputation: 33498

You can easily adapt the answers to another very similar question:

cumsum.bounded <- cppFunction(
'NumericVector cumsum_bounded (NumericVector x, NumericVector upper) {
    double acc = 0;
    double lower = 0;
    NumericVector res(x.size());
    for (int i = 0; i < x.size(); i++) {
        acc += x[i];
        if (acc < lower)         acc = lower;
        else if (acc > upper[i]) acc = upper[i];
        res[i] = acc;
    }
    return res;
}')


test_data$cs_bounded  <- cumsum.bounded(test_data$prec_et, test_data$upper_bound)

#    prec_et upper_bound cs_bounded
#      <dbl>       <dbl>      <dbl>
#  1       1          10          1
#  2       2          10          3
#  3      -1          10          2
#  4      -3          10          0
#  5       5          10          5
#  6       4          10          9
#  7       5          10         10
#  8       4          13         13
#  9       3          14         14
# 10      -3          15         11
# 11      -2          16          9
# 12       4          17         13
# 13       3          18         16
# 14       4          19         19

Where

test_data <- data.frame(
  prec_et = c(1, 2, -1, -3, 5, 4, 5, 4, 3, -3, -2, 4, 3, 4),
  upper_bound = c(10, 10, 10, 10, 10, 10, 10, 13, 14, 15, 16, 17, 18, 19)
)

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269291

1) We used the test_data shown below since the input shown in the question gives a syntax error. This gives the same answer as in the image except for the 5th row which seems incorrect in the image.

We encode the upper bounds in the imaginary parts of the input and use Reduce.

library(dplyr)
test_data <- tibble(prec_et = c(1,2,-1,-3,5,4,5,4,3,-3,-2,4,4,4),
 upper_bound = c(10,10,10,10,10,10,13,14,15,16,17,18,19,20))

f <- function(x, y) pmax(0, pmin(x + Re(y), Im(y)))

test_data %>%
   mutate(cum = Reduce(
     f = f,
     x = prec_et[-1] + upper_bound[-1] * 1i,
     init = max(0, min(prec_et[1], upper_bound[1])),
     acc = TRUE))

giving

# A tibble: 14 × 3
   prec_et upper_bound   cum
     <dbl>       <dbl> <dbl>
 1       1          10     1
 2       2          10     3
 3      -1          10     2
 4      -3          10     0
 5       5          10     5
 6       4          10     9
 7       5          13    13
 8       4          14    14
 9       3          15    15
10      -3          16    12
11      -2          17    10
12       4          18    14
13       4          19    18
14       4          20    20

2) The above uses complex numbers to avoid having to deal with indices but if we do go with indices we can do it like this:

test_data %>%
   mutate(cum = Reduce(
     f = \(x, i) pmax(0, pmin(x + prec_et[i], upper_bound[i])), 
     x = row_number()[-1],
     init = max(0, min(prec_et[1], upper_bound[1])),
     acc = TRUE))

Upvotes: 1

Dave Rosenman
Dave Rosenman

Reputation: 1447

This should work

bounded_cumsum <- function(values, lower_bounds = -Inf, upper_bounds = Inf) {
  n <- length(values)
  
  # Recycle lower_bounds and upper_bounds if their lengths are 1
  if (length(lower_bounds) == 1) {
    lower_bounds <- rep(lower_bounds, n)
  }
  
  if (length(upper_bounds) == 1) {
    upper_bounds <- rep(upper_bounds, n)
  }

  
  # Initialize the result vector and the cumulative sum
  result <- numeric(n)
  current_sum <- 0
  
  for (i in seq_along(values)) {
    current_sum <- current_sum + values[i]
    
    # Apply the lower bound
    if (current_sum < lower_bounds[i]) {
      current_sum <- lower_bounds[i]
    }
    
    # Apply the upper bound
    if (current_sum > upper_bounds[i]) {
      current_sum <- upper_bounds[i]
    }
    
    result[i] <- current_sum
  }
  
  return(result)
}
test_data <- tibble(prec_et = c(1,2,-1,-3,5,4,5,4,3,-3,-2,4,3,4),
                    upper_bound = c(10,10,10,10,10,10,10,13,14,15,16,17,18,19))
bounded_cumsum(test_data$prec_et, 0, test_data$upper_bound)
[1]  1  3  2  0  5  9 10 13 14 11  9 13 16 19

If you want to improve the function above, you could add tests to make sure each lower_bound is less than each upper_bound. And test that the lengths of lower_bounds or upper_bounds are either 1 or equal to values. But as long as you use the function above wisely it should work.

Upvotes: 0

Related Questions