Taotao Tan
Taotao Tan

Reputation: 283

Faster evaluation of matrix multiplication from right to left

I noticed that evaluating matrix operations in quadratic form from right to left is significantly faster than left to right in R, depending on how the parentheses are placed. Obviously they both perform the same amount of calculation. I am wondering why this is the case. Does this have anything to do with memory allocation?

# A: 5000 * 5000
# B: 5000 * 2
A = matrix(runif(5000 * 5000), nrow = 5000)
B = matrix(rbinom(5000 * 2, size = 2, prob = 0.3), nrow = 5000)

microbenchmark((t(B) %*% A) %*% B, t(B) %*% (A %*% B), times = 100)

enter image description here

Here is the session info:

R version 4.2.0 (2022-04-22)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_1.0.9           microbenchmark_1.4.9

loaded via a namespace (and not attached):
 [1] compiler_4.2.0           fastmap_1.1.0            cli_3.3.0                htmltools_0.5.3          tools_4.2.0             
 [6] RcppArmadillo_0.11.2.4.0 rstudioapi_0.13          yaml_2.3.5               rmarkdown_2.14           knitr_1.39              
[11] xfun_0.31                digest_0.6.29            rlang_1.0.4              evaluate_0.15           

EDIT: A simplified version of the matrix multiplication which displays the same error.

k <- 5000L; m <- n <- 2L;
A <- matrix(rnorm(k * k), k, k);
B <- matrix(rnorm(k * n), k, n);
tB <- t(B);
microbenchmark::microbenchmark(tB %*% A, A %*% B, times = 100)

Upvotes: 14

Views: 572

Answers (1)

GKi
GKi

Reputation: 39717

It looks like this is a matter of the implementation and in which order the elements are accessed. See Why does the order of the loops affect performance when iterating over a 2D array?.

When doing the same but changing the order of loops the timing is different.

Rcpp::cppFunction("NumericMatrix mm(const NumericMatrix& A, const NumericMatrix& B) {
int M = A.nrow();
int N = A.ncol();
int P = B.ncol();
NumericMatrix res(M, P);
for (int n=0; n<N; ++n) {  //Loop n, p, m
  for (int p=0; p<P; ++p) {
    for (int m=0; m<M; ++m) {
      res[m+p*M] += A[m+M*n] * B[p*N+n];
    }
  }
}
return res;}")

Rcpp::cppFunction("NumericMatrix mm2(const NumericMatrix& A, const NumericMatrix& B) {
int M = A.nrow();
int N = A.ncol();
int P = B.ncol();
NumericMatrix res(M, P);
for (int m=0; m<M; ++m) {  //Loop m, p, n
  for (int p=0; p<P; ++p) {
     for (int n=0; n<N; ++n) {
      res[m+p*M] += A[m+M*n] * B[p*N+n];
    }
  }
}
return res;}")
k <- 5000L; m <- n <- 2L;
A <- matrix(rnorm(k * k), k, k);
B <- matrix(rnorm(k * n), k, n);
tB <- t(B);

met <- alist("(tB*A)B"     = tB %*% A %*% B,
             "tB(A*B)"     = tB %*% (A %*% B),
             "mm (tB*A)B"  = mm(mm(tB, A), B),
             "mm tB(A*B)"  = mm(tB, mm(A, B)),
             "mm2 (tB*A)B" = mm2(mm2(tB, A), B),
             "mm2 tB(A*B)" = mm2(tB, mm2(A, B)),
             "cp(B,A)B"    = crossprod(B, A) %*% B,
             "cp(B,A*B)"   = crossprod(B, A %*% B) )
bench::mark(exprs = met)
#  expression     min median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴ result  
#  <bch:expr>  <bch:> <bch:>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t> <list>  
#1 (tB*A)B     79.5ms 80.1ms    12.5 78.17KB       0     7     0   562ms <dbl[…]>
#2 tB(A*B)     33.8ms 34.4ms    28.4 78.17KB       0    15     0   528ms <dbl[…]>
#3 mm (tB*A)B  61.9ms 62.5ms    15.9  3.85MB       0     8     0   502ms <dbl[…]>
#4 mm tB(A*B)  19.9ms 20.7ms    48.1 83.16KB       0    25     0   520ms <dbl[…]>
#5 mm2 (tB*A)B 35.9ms 39.4ms    25.8 87.29KB       0    13     0   504ms <dbl[…]>
#6 mm2 tB(A*B) 47.8ms 48.1ms    20.6 83.16KB       0    11     0   535ms <dbl[…]>
#7 cp(B,A)B    44.1ms 44.5ms    22.4 80.42KB       0    12     0   536ms <dbl[…]>
#8 cp(B,A*B)   34.1ms 36.5ms    27.1 78.17KB       0    14     0   516ms <dbl[…]>

microbenchmark::microbenchmark(list = met)
#Unit: milliseconds
#        expr      min       lq     mean   median       uq      max neval
#     (tB*A)B 77.09484 77.86891 79.09483 78.44832 80.08971 87.05563   100
#     tB(A*B) 33.63306 34.22562 36.08482 35.14064 36.64080 51.39962   100
#  mm (tB*A)B 62.05235 64.14361 66.54568 65.16927 67.98617 75.96242   100
#  mm tB(A*B) 19.67066 20.28369 20.83781 20.53820 21.19940 23.64119   100
# mm2 (tB*A)B 35.31290 35.70006 36.62846 36.10282 37.41669 40.47473   100
# mm2 tB(A*B) 48.16574 49.70702 51.55844 50.26292 52.46479 67.44558   100
#    cp(B,A)B 43.18166 44.01366 45.28434 44.71301 46.41521 48.97891   100
#   cp(B,A*B) 33.62158 34.47070 35.84743 35.11853 36.55979 48.89021   100

Upvotes: 4

Related Questions