Reputation: 7373
My question is yet another "Vectorize this!". Similar question appeared elsewhere (Efficient way of calculating quadratic forms: avoid for loops?), but somehow I can't seem to make it work for my case.
I want to calculate quadratic form x'Sx
for every p
-dimensional observation x
in the sample of size n
. I couldn't figure out a nice, vectorized code, so my final resort is to do it by for loop
. Following toy example is for p=2
, n=100
.
set.seed(123)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
x <- cbind(x1,x2)
Sigma <- matrix(c(1, 2, 3, 4), ncol = 2)
z <- rep(0, n)
for (i in 1:n) {
z[i] <- x[i, ] %*% solve(Sigma, x[i, ]) #quadratic form of x'S^{-1}x
}
Like many other R users who worship vectorized codes, the use of for loop caused emotional pain. In order to ease the pain, I modified my code using a couple of common vectorization technique.
ap <- function(Sigma, x) apply(x, 1, function(x) x %*% solve(Sigma, x))
lap <- function(Sigma, x) unlist(lapply(1:n, function(i) x[i, ] %*% solve(Sigma, x[i, ])))
loop <- function(Sigma, x){
z <- rep(0, n)
for (i in 1:n) {
z[i] <- x[i, ] %*% solve(Sigma, x[i, ])
}
z
}
But the speed comparison shows nothing much is gained.
library(microbenchmark)
microbenchmark(lap(Sigma, x), ap(Sigma, x), loop(Sigma, x))
# Unit: milliseconds
# expr min lq mean median uq max neval
# lap(Sigma, x) 4.207434 4.444895 5.092389 4.616912 5.283504 8.440802 100
# ap(Sigma, x) 4.360204 4.523306 5.317304 4.685396 5.412771 10.168674 100
# loop(Sigma, x) 4.518645 4.679317 6.204626 4.827831 5.438908 94.115144 100
Is there any room for improvement, or should I go to Rcpp for freeing myself from sin of using for loops
?
Upvotes: 4
Views: 662
Reputation: 4841
The following is not the case in your example here but might be in many other cases. Presume that your matrix is positive definite. E.g. let Sigma <- matrix(c(1, 1, 1, 4), ncol = 2)
. Then you can back solve after having computed a Cholesky decomposition which is numerically a better idea and also faster in this case:
lap <- function(Sigma, x) unlist(lapply(1:n, function(i) x[i, ] %*% solve(Sigma, x[i, ])))
bench::mark(lap = lap(Sigma, x),
product = diag(x %*% solve(Sigma) %*%t(x)),
`product smarter` = diag(x %*% tcrossprod(solve(Sigma), x)),
colsum = colSums(t(x) * (solve(Sigma) %*% t(x))),
`colsum (smarter)` = rowSums(x %*% solve(Sigma) * x),
backsolve = {
ch <- chol(Sigma)
colSums(backsolve(ch, t(x), transpose = TRUE)^2)
})
#R> # A tibble: 6 x 13
#R> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
#R> <bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
#R> 1 lap 800.1µs 878µs 1113. 1.66KB 12.7 527 6 474ms <dbl [… <Rprofmem… <bch:tm… <tibble …
#R> 2 product 30.4µs 32µs 29895. 83.92KB 53.9 9982 18 334ms <dbl [… <Rprofmem… <bch:tm… <tibble …
#R> 3 product smarter 28.2µs 30µs 32300. 82.31KB 61.5 9981 19 309ms <dbl [… <Rprofmem… <bch:tm… <tibble …
#R> 4 colsum 20.5µs 22.3µs 44005. 5.66KB 13.2 9997 3 227ms <dbl [… <Rprofmem… <bch:tm… <tibble …
#R> 5 colsum (smarter) 15.1µs 16.2µs 60101. 4.05KB 12.0 9998 2 166ms <dbl [… <Rprofmem… <bch:tm… <tibble …
#R> 6 backsolve 15µs 16.3µs 59162. 5.66KB 11.8 9998 2 169ms <dbl [… <Rprofmem… <bch:tm… <tibble …
Notice also the faster version of the colSums
version posted in this answer where one avoid the transpose call.
Upvotes: 0
Reputation: 7373
Thank you very much for the concern, @konvas, @beginneR.
I was mistakenly calculating x'S^{-1}x
whereas the answer given by @Ben Bolker in the above link calculates x'Sx
. colSums(t(x) * (solve(Sigma) %*% t(x)))
solved my problem altogether.
I am now even more devoted worshipper of the vectorized code.
microbenchmark(lap(Sigma, x),
product = diag(x %*% solve(Sigma) %*%t(x)),
colsum = colSums(t(x) * (solve(Sigma) %*% t(x)))
)
#Unit: microseconds
# expr min lq mean median uq max neval
# lap(Sigma, x) 4283.616 4384.9215 4961.9761 4475.6920 4700.3885 9472.096 100
# product 126.835 134.3315 165.3030 144.0575 199.5725 306.349 100
# colsum 92.391 102.9265 130.0202 110.8285 146.2855 346.061 100
Upvotes: 0
Reputation: 14346
If you store the rows of x
in a list and use vapply
instead of lapply
, you can speed this up a bit as follows
# First, make a list of the rows of x
xl <- vector("list",nrow(x))
for (i in seq_along(xl)) xl[[i]] <- x[i, ]
# Apply solve
solve.mat <- vapply(xl, solve, numeric(2), a = Sigma)
# Take the dot product of each pair of elements
result <- colSums(solve.mat * t(x))
all(result == lap(Sigma, x))
# [1] TRUE
Writing it in one step and comparing
library(microbenchmark)
microbenchmark(lap = lap(Sigma, x),
csums = colSums(vapply(xl, solve, numeric(2), a = Sigma) * t(x)))
# Unit: milliseconds
# expr min lq mean median uq max neval
# lap 3.013343 3.050855 3.164558 3.097901 3.136355 4.206923 100
# csums 2.224350 2.263772 2.354349 2.289751 2.317672 3.660294 100
Upvotes: 1