Anti
Anti

Reputation: 397

Efficient way to calculate cross products, multiply matrix and sum it up

First, let's produce some dummy data for reference:

X <- matrix(runif(27*27, 0, 1), nrow=27, ncol=27)
Y <- matrix(runif(27*27, 0, 1), nrow=27, ncol=27)

I have 2 matrices, X and Y. First, I'm going to calculate the cross product matrix for the first two column vectors of X using R's command

cp <- tcrossprod(X[,1], X[,2])

The result, cp, is now multiplied with the matrix Y and all products are summed up:

res <- sum(cp * Y, na.rm=T)

Now I'm looking for a fast and thus efficient way to perform this calculation in R for all combinations of column vectors of the matrix X. The results should be saved in a third matrix of same dimensions as X and Y, the matrix Z, at Z[i,j] for the i-th and j-th columns of X.

I already did this job with two stacked for loops:

Z <- matrix(nrow=27, ncol=27)
for (i in 1:ncol(X)) {
 for (j in 1:ncol(X)) {
  cp     <- tcrossprod(X[,i], X[,j])
  Z[i,j] <- sum(cp * Y)
 }
}

However, it isn't as fast as I want it to be.

Thus, I'd be very grateful if you could help me to find a solution which is faster than my stacked for loop solution.

Many thanks in advance!

PS: I have stored 13 matrices X in a list. The calculations should be performed for all of these matrices. However, I assume that once we find an efficient way for the calculation with 1 matrix, I could use this way together with lapply to do the whole operations on the complete list?!

Upvotes: 0

Views: 1489

Answers (2)

jogo
jogo

Reputation: 12569

Each element Z[i,j] can be written as a bilinear form. The rest is: puttig together all similar calculations for the matrix Z.
You can do:

Z <- t(X) %*% Y %*% X  ### or
Z <- crossprod(X, Y) %*% X

To compare this calculation with your code:

set.seed(42)
n <- 27
X <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
Y <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)

Z <- matrix(nrow=n, ncol=n)
for (i in 1:ncol(X)) {
  for (j in 1:ncol(X)) {
    cp     <- tcrossprod(X[,i], X[,j])
    Z[i,j] <- sum(cp * Y)
  }
}

Z2 <- t(X) %*% Y %*% X
Z3 <- crossprod(X, Y) %*% X
sum(abs(Z2-Z))
sum(abs(Z3-Z))

If L is the list of your 13 matrices X. you can do:

lapply(L, function(X) crossprod(X, Y) %*% X)

Here is the benchmarking:

Z1 <- function(X) {
  Z <- matrix(nrow=27, ncol=27)
  for (i in 1:ncol(X)) {
    for (j in 1:ncol(X)) {
      cp     <- tcrossprod(X[,i], X[,j])
      Z[i,j] <- sum(cp * Y)
    }
  }
  return(Z)
}

library("microbenchmark")

microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#> microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#Unit: microseconds
# expr      min        lq       mean    median       uq      max neval cld
#   Z1 3563.167 3671.6355 4391.00888 3721.3380 3874.617 9423.808   100   b
#   Z2   26.558   27.3420   34.31214   35.5865   39.815   56.426   100  a 
#   Z3   24.779   25.1675   27.43546   26.0965   28.034   47.268   100  a 

The solutions from Ronak are not faster than the original code, i.e. they are loop-hiding:

fun <- function(x, y) sum(tcrossprod(X[,x], X[,y]) *Y)

microbenchmark(Z1=Z1(X), 
               R1=outer(seq_len(ncol(X)), seq_len(ncol(X)), Vectorize(fun)), 
               R2=t(sapply(seq_len(ncol(X)), function(x) 
                 sapply(seq_len(ncol(X)), function(y)  sum(tcrossprod(X[,x], X[,y]) *Y)))),
               R3=t(apply(X, 2, function(x) apply(X, 2, function(y) sum(tcrossprod(x, y) *Y)))),
               unit="relative")
# Unit: relative
# expr      min       lq     mean   median       uq       max neval cld
#   Z1 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000   100  a 
#   R1 1.207583 1.213846 1.195597 1.216147 1.223139  1.060187   100  ab
#   R2 1.225521 1.230332 1.487811 1.230852 1.299253 13.140022   100   b
#   R3 1.156546 1.158774 1.217766 1.160142 2.012623  1.098679   100  ab

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389235

We can use outer to apply for every combination of columns

fun <- function(x, y) sum(tcrossprod(X[,x], X[,y]) *Y)
outer(seq_len(ncol(X)), seq_len(ncol(X)), Vectorize(fun))

Or nested sapply

t(sapply(seq_len(ncol(X)), function(x) 
         sapply(seq_len(ncol(X)), function(y)  sum(tcrossprod(X[,x], X[,y]) *Y))))

Or with apply

t(apply(X, 2, function(x) apply(X, 2, function(y) sum(tcrossprod(x, y) *Y))))

This gives same result as your Z with two for-loop. I am not sure if there are any performance gains using any of the above approaches since we are not doing anything completely different here.

Upvotes: 1

Related Questions