sharp
sharp

Reputation: 75

Cumulative sum of one column based on rolling window in another column

I am trying to find the cumulative sum of x=c(1,2,3,4,5) based on moving window of y=c(1,1,2,3,2). I tried the following code based on rollapply in zoo package:

rollapply(c(1,2,3,4,5),c(1,1,2,3,2),cumsum,align="right",partial=T)

My expected answer is 1,2,5,9,9 (because I am cumulating from the left side) but instead I get the below 4x3 table:

enter image description here

How can I correct the code to get the desired result?

Upvotes: 1

Views: 86

Answers (2)

jblood94
jblood94

Reputation: 16981

Vectorized using cumsum and diff:

c(
  diff(
    matrix(
      c(0, cumsum(x))[c(seq_along(y) - y, seq_along(y)) + 1], 2, length(y), 1
    )
  )
)
#> [1] 1 2 5 9 9

Benchmarking:

set.seed(0)
x <- seq.int(1e4)
y <- ceiling(x*runif(x))

library(zoo)

microbenchmark::microbenchmark(
  rollapply = rollapply(x, y, sum, align = "right", partial = T),
  mapply = mapply(\(a, b) sum(x[b + 1 - a:1]), y, seq_along(y)),
  cumsum = c(diff(matrix(c(0, cumsum(x))[c(seq_along(y) - y, seq_along(y)) + 1], 2, length(y), 1))),
  check = "equal",
  times = 10,
  unit = "relative"
)
#> Unit: relative
#>       expr       min        lq      mean    median       uq       max neval
#>  rollapply 2170.9093 2068.8286 1692.3103 1505.4102 1557.881 1542.7317    10
#>     mapply  741.6908  699.2924  632.0529  511.8272  643.252  666.9789    10
#>     cumsum    1.0000    1.0000    1.0000    1.0000    1.000    1.0000    10

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

I think you should use sum instead of cumsum

> zoo::rollapply(c(1, 2, 3, 4, 5), c(1, 1, 2, 3, 2), sum, align = "right", partial = T)
[1] 1 2 5 9 9

and you can use mapply as well to achieve the same output

mapply(\(a, b) sum(x[b + 1 - a:1]), y, seq_along(y))

Benchmark

set.seed(0)
x <- seq.int(1e4)
y <- sort(sample(1:10, length(x), TRUE))

library(zoo)
microbenchmark(
  rollapply = rollapply(x, y, sum, align = "right", partial = T),
  mapply = mapply(\(a, b) sum(x[b + 1 - a:1]), y, seq_along(y)),
  check = "identical",
  times = 10L,
  unit = "relative"
)

shows rollapply is not efficient as mapply

Unit: relative
      expr      min      lq     mean  median       uq      max neval
 rollapply 9.362684 9.32441 9.340693 9.31007 9.985161 9.342603    10
    mapply 1.000000 1.00000 1.000000 1.00000 1.000000 1.000000    10

Upvotes: 3

Related Questions