Reputation: 371
I defined an R function that contains a matrix, a vector and a parameter a
. I need to compute the results of the function for different values of a
. This is simple to code in R
, but very slow when the matrix is "big" and number of the parameter values are large.
Can I define the function in R
and do the for loop in Rcpp
?
Can it speed up the computations?
A minimal example of a foo
function in R
is
f <- function(X,y,a){
p = ncol(X)
res = (crossprod(X) + a*diag(1,p))%*%crossprod(X,y)
}
set.seed(0)
X <- matrix(rnorm(50*5),50,5)
y <- rnorm(50)
a <- seq(0,1,0.1)
result <- matrix(NA,ncol(X),length(a))
for(i in 1:length(a)){ # Can I do this part in Rcpp?
result[,i] <- f(X,y,a[i])
}
result
Upvotes: 2
Views: 913
Reputation: 73405
I simply suggest avoiding recomputing X'X
and X'y
in the loop as they are loop-invariant.
f <- function (XtX, Xty, a) (XtX + diag(a, ncol(XtX))) %*% Xty
set.seed(0)
X <- matrix(rnorm(50*5),50,5)
y <- rnorm(50)
a <- seq(0,1,0.1)
result1 <- matrix(NA, ncol(X), length(a))
XtX <- crossprod(X)
Xty <- crossprod(X, y)
for(i in 1:length(a)) {
result1[,i] <- f(XtX, Xty, a[i])
}
## compare with your `result`
all.equal(result, result1)
#[1] TRUE
hours later...
When I return I see something more to simplify.
(XtX + diag(a, ncol(XtX))) %*% Xty = XtX %*% Xty + diag(a, ncol(XtX)) %*% Xty
= XtX %*% Xty + a * Xty
So actually, XtX %*% Xty
is loop-invariant, too.
f <- function (XtX.Xty, Xty, a) XtX.Xty + a * Xty
set.seed(0)
X <- matrix(rnorm(50*5),50,5)
y <- rnorm(50)
a <- seq(0,1,0.1)
result2 <- matrix(NA, ncol(X), length(a))
XtX <- crossprod(X)
Xty <- c(crossprod(X, y)) ## one-column matrix to vector
XtX.Xty <- c(XtX %*% Xty) ## one-column matrix to vector
for(i in 1:length(a)) {
result2[,i] <- f(XtX.Xty, Xty, a[i])
}
## compare with your `result`
all.equal(result, result2)
#[1] TRUE
And it turns out that we can get rid of the loop:
## "inline" function `f`
for(i in 1:length(a)) {
result2[,i] <- XtX.Xty + a[i] * Xty
}
## compare with your `result`
all.equal(result, result2)
#[1] TRUE
## do it with recycling rule
for(i in 1:length(a)) {
result2[,i] <- a[i] * Xty
}
result2 <- XtX.Xty + result2
## compare with your `result`
all.equal(result, result2)
#[1] TRUE
## now use `tcrossprod`
result2 <- XtX.Xty + tcrossprod(Xty, a)
## compare with your `result`
all.equal(result, result2)
#[1] TRUE
I completely agree with you that your example code in the question is just a "foo"
. And you may not have thought carefully about it when posting it. However, it suffices to show that when writing a loop, either an R loop or a C / C++ / FORTRAN loop, we should always seek to pull those loop-invariant out of the loop to reduce computational complexity.
Your concern is to get speedup with Rcpp (or any compiled language). You want to vectorize a section of R code that is not readily vectorized. However, "%*%"
, crossprod
and tcrossprod
are mapped to BLAS (FORTRAN code) and are not R-level computations. You readily have everything vectorized.
Don't always blame the interpretation overhead (as R is an interpreted language) of an R-loop for the poor performance. Such overhead is insignificant if each iteration is doing some "heavy" computations, like big matrix computations (using BLAS) or fitting statistical models (like lm
). In fact, if you do want to write such a loop with a compiled language, use lapply
function. This function implements loop at C level, and calls R function for each iteration. Alternatively, Ralf's answer is an Rcpp equivalent. In my opinion, a loop nest written in R code is more likely to be inefficient.
Upvotes: 7
Reputation: 26843
The answer by 李哲源 correctly identifies what should be done in your case. As for your original question the answer is two-fold: Yes, you can move the loop to C++ with Rcpp. No, you won't gain performance:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericMatrix fillMatrix(Rcpp::NumericMatrix X,
Rcpp::NumericVector y,
Rcpp::NumericVector a,
Rcpp::Function f) {
Rcpp::NumericMatrix result = Rcpp::no_init(X.cols(), a.length());
for (int i = 0; i < a.length(); ++i) {
result(Rcpp::_, i) = Rcpp::as<Rcpp::NumericVector>(f(X, y, a[i]));
}
return result;
}
/*** R
f <- function(X,y,a){
p = ncol(X)
res = (crossprod(X) + a*diag(1,p))%*%crossprod(X,y)
}
X <- matrix(rnorm(500*50),500,50)
y <- rnorm(500)
a <- seq(0,1,0.01)
system.time(fillMatrix(X, y, a, f))
# user system elapsed
# 0.052 0.077 0.075
system.time({result <- matrix(NA,ncol(X),length(a))
for(i in 1:length(a)){
result[,i] <- f(X,y,a[i])
}
})
# user system elapsed
# 0.060 0.037 0.049
*/
So the Rcpp solution is actually slower than the R solution in this case. Why? Because the real work is done within the function f
. This is the same for both solutions, but the Rcpp solution has the additional overhead of calling back to R from C++. Note that for loops in R are not necessarily slow. BTW, here some benchmark data:
expr min lq mean median uq max neval
fillMatrixR() 41.22305 41.86880 46.16806 45.20537 49.11250 65.03886 100
fillMatrixC() 44.57131 44.90617 49.76092 50.99102 52.89444 66.82156 100
Upvotes: 5