Reputation: 161
I'm trying to evaluate the following double sums in r:
I know outer
is a quick way to do it. I've tried the following
sum(f(outer(X,X,function(x,y) (x-y)/c)))
and while it seems to work, I'm not sure just how fast it is compare to some alternatives? Does it make a difference in terms of speed to do outer
first then my function or vice versa? is there a better way to do it?
Upvotes: 1
Views: 437
Reputation: 73285
I would like to point out first that you can write you code as
sum(f(outer(x, x, "-") / c))
This reduces function call overhead, as subtraction in R is already a function. Try "-"(5, 2)
.
outer
is fast enough for your application. The only case when it is suboptimal is when your function f
is symmetric around 0, i.e., f(-u) = f(u)
. In this case, optimal computation only sums over the lower triangular of combination matrix outer(x, x, "-")
, and multiply the sum by 2 for summation over off-diagonals. Finally, diagonal results are added.
The following function does this. We generate (i, j)
indices for the lower triangular part (excluding diagonal) of the combination matrix, then the lower triangular part of outer(x, x, "-") / c
would be dx <- (x[i] - x[j]) / c
. Now,
f
is symmetric, the result is 2 * sum(f(dx)) + n * f(0)
, and this is faster than outer
;f
is asymmetric, we have to do sum(f(dx)) + sum(f(-dx)) + n * f(0)
, and this won't have any advantage over outer
.## `x` is the vector, `f` is your function of interest, `c` is a constant
## `symmetric` is a switch; only set `TRUE` when `f` is symmetric around 0
g <- function (x, f, c, symmetric = FALSE) {
n <- length(x)
j <- rep.int(1:(n-1), (n-1):1)
i <- sequence((n-1):1) + j
dx <- (x[i] - x[j]) / c
if (symmetric) 2 * sum(f(dx)) + n * f(0)
else sum(f(dx)) + sum(f(-dx)) + n * f(0)
}
Consider a small example here. Let's assume c = 2
and a vector x <- 1:500
. We also consider a symmetric function f1 <- cos
and an asymmetric function f2 <- sin
. Let's do a benchmark:
x <- 1:500
library(microbenchmark)
We first consider the symmetric case with f1
. Remember to set symmetric = TRUE
for g
.
microbenchmark(sum(f1(outer(x,x,"-")/2)), g(x, f1, 2, TRUE))
#Unit: milliseconds
# expr min lq mean median uq
# sum(f2(outer(x, x, "-")/2)) 32.79472 35.35316 46.91560 36.78152 37.63580
# g(x, f2, 2, TRUE) 20.24940 23.34324 29.97313 24.45638 25.33352
# max neval cld
# 133.5494 100 b
# 120.3278 100 a
Here we see that g
is faster.
Now consider the asymmetric case with f2
.
microbenchmark(sum(f2(outer(x,x,"-")/2)), g(x, f2, 2))
#Unit: milliseconds
# expr min lq mean median uq
# sum(f2(outer(x, x, "-")/2)) 32.84412 35.55520 44.33684 36.95336 37.89508
# g(x, f2, 2) 36.71572 39.11832 50.54516 40.25590 41.75060
# max neval cld
# 134.2991 100 a
# 142.5143 100 a
As expected, there is no advantage here.
Yes, we also want to check that g
is doing the correct computation. It is sufficient to consider a small example, with x <- 1:5
.
x <- 1:5
#### symmetric case ####
sum(f1(outer(x, x, "-") / 2))
# [1] 14.71313
g(x, f1, 2, TRUE)
# [1] 14.71313
#### asymmetric case ####
sum(f2(outer(x, x, "-") / 2))
# [1] 0
g(x, f2, 2)
# [1] 0
So g
is correct.
Upvotes: 3