Thomas
Thomas

Reputation: 501

Vectorize a loop where iterations are dependent on previous ones in R

I have the following working code:

x <- c(10.7, 13.0, 11.4, 11.5, 12.5, 14.1, 14.8, 14.1, 12.6, 16.0, 11.7, 10.6,
       10.0, 11.4, 7.9, 9.5, 8.0, 11.8, 10.5, 11.2, 9.2, 10.1, 10.4, 10.5)

cusum <- function(x) {
  
  s <- NA
  mn <- mean(x)
  
  for (i in seq_along(x)) {
  if (i == 1)
    s[i] <- 0 + x[i] - mn
  else
    s[i] <- s[i - 1] + x[i] - mn
  }
  
  s
  
}

cusum(x)

I wish to vectorize my code for performance, but I do not know how because:

How can I vectorize my function in base R? I work in a restrictive environment where I only have access to base R.

Upvotes: 1

Views: 140

Answers (2)

r2evans
r2evans

Reputation: 160417

@Cettt's answer is the most direct, and if that's what is needed, stick with it. However, if your point is that you don't know how to vectorize an arbitrary function, then ...

When your calculations rely on a value and the previous value, that's one thing, and vectorizing can be done. However, it looks as if your calculations rely on a previous calculation, in which case it's a different thing altogether. Your function appears to be the latter, so here's an attempt:

func <- function(x) {
  mn <- mean(x)
  Reduce(function(a, b) {
    a + b - mn
  }, x, init = 0, accumulate = TRUE)[-1]
}

func(x)
#  [1] -6.96e-01  9.08e-01  9.12e-01  1.02e+00  2.12e+00  4.82e+00  8.23e+00
#  [8]  1.09e+01  1.21e+01  1.67e+01  1.70e+01  1.62e+01  1.49e+01  1.49e+01
# [15]  1.14e+01  9.47e+00  6.07e+00  6.47e+00  5.58e+00  5.38e+00  3.19e+00
# [22]  1.89e+00  8.96e-01 -1.60e-14

Walk-through:

  • Reduce first calls func(x[1], x[2]), let's call that result res[1]. It then calls func(res[1], x[2]) (and stores in res[2]). Then func(res[2], x[3]), etc.

  • Since you have a different first-condition (if (i == 1)), then we use init=0 to seed that initial state. We could also have done something like:

    func <- function(x) {
      mn <- mean(x)
      Reduce(function(a, b) {
        if (is.null(a)) {
          0 + b - mn
        } else {
          a + b - mn
        }
      }, x, init = NULL, accumulate = TRUE)[-1]
    }
    

    allowing for a more-complex initial condition.

    Since we defined init=, we also need to drop the first returned value in this case, ergo the [-1].

  1. Reduce defaults to only returning the last calculated value (-1.60e-14 here), so we add accumulate=TRUE to capture all of the calculated values.

Upvotes: 2

Cettt
Cettt

Reputation: 11981

You don't need loops for this particular iteration: just use cumsum on x - mean(x):

cumsum(x - mean(x))
 [1] -6.958333e-01  9.083333e-01  9.125000e-01  1.016667e+00  2.120833e+00  4.825000e+00  8.229167e+00  1.093333e+01  1.213750e+01
[10]  1.674167e+01  1.704583e+01  1.625000e+01  1.485417e+01  1.485833e+01  1.136250e+01  9.466667e+00  6.070833e+00  6.475000e+00
[19]  5.579167e+00  5.383333e+00  3.187500e+00  1.891667e+00  8.958333e-01 -1.598721e-14

Upvotes: 2

Related Questions