RomainD
RomainD

Reputation: 311

Dynamic matrix multiplication without loops

I'm stuck with the following problem:


Suppose that I have three pairs of sequences:

{a_1, ..., a_N}, {A_1, ..., A_T}, {b_1, ..., b_N}, {B_1, ..., B_T}, {c_1, ..., c_N}, {C_1, ..., C_T}.

My aim is to perform the following action (without looping!):

for (i in 1:N) {
  for (j in 1:N) {
    for (k in 1:N) {
      ret[i,j,k] <- \sum_{t=1}^T (a_i - A_t) * (b_j - B_t) * (c_k - C_t)
}}}

The reason why I don't want to loop is that there might be more pairs of sequences than simply those three. And I want to structure the code as "efficiently and flexible" as possible.

If we only have two pairs of sequences, then the problem is quite easy since it reduces to simple matrix multiplication (an (N x T) matrix for (a_i - A_t) and an (N x T) matrix for (b_i - B_t) where you multiply the first with the transpose of the second).

But once you have more than two pairs of sequences, I'm not sure whether it can be done without loops since the dimensionality of output depends on the number of pairs of sequences...


------------------------------------------ Related problem (Nov. 8th, 2013) ------------------------------------------

I got the first part successfully implemented thanks to @-mrip. But how would the code have to be altered if I want the following:

for (i in 1:N) {
  for (j in 1:N) {
    for (k in 1:N) {
      ret[i,j,k] <- \sum_{t=1}^T foo(a_i, a_i - A_t) * foo(b_j, b_j - B_t) * foo(c_k, c_k - C_t)
}}}

Where foo(a, a-A) is some common bivariate function. Is there a "general" solution, or do you need more information about the structure of foo(a, a-A)?

I tried it using the straightforward solution and simply implement the loops. Of course this is neither flexible (since I have to restrict myself in advance to the possible number of pairs/dimensions) nor fast (since both a and A can be large - although it might be the case that a is simply a scalar and A some set of observations).

I know that I might be demanding. But I' m totally stuck on this problem since quite some time... Hence any help is extremely welcome.

Upvotes: 3

Views: 1858

Answers (1)

mrip
mrip

Reputation: 15163

This actually doesn't require any matrix multiplication at all. It only requires taking outer products, and the final result can be put together efficiently and concisely exploiting the column-major layout of R matrices and multidimensional arrays. The computation can be made more efficient by expanding out the product in the loop into individual summands. This leads to the following implementation.

 vecout<-function(...)as.vector(outer(...))

 f2<-function(a,b,c,A,B,C){
 N<-length(a)
 t<-length(A)

 ab<-vecout(a,b)
 ret<-array(vecout(ab,c),c(N,N,N))*t

 ret<-ret - ab * sum(C)
 ret<-ret - vecout(a,rep(c,each = N)) * sum(B)
 ret<-ret - rep(vecout(b,c) * sum(A),each=N)
 ret<-ret + a * sum(B*C)
 ret<-ret + rep(b * sum(A*C),each=N)
 ret<-ret + rep(c * sum(A*B),each=N^2)
 ret<-ret - sum(A*B*C)
 ret
 }

We can compare the running time and check the correctness as follows. Here is the naive implementation:

f1<-function(a,b,c,A,B,C){
 N<-length(a)
 ret<-array(0,c(N,N,N))
 for(i in 1:N)
  for(j in 1:N)
   for(k in 1:N)
    ret[i,j,k]<-sum((a[i]-A)*(b[j]-B)*(c[k]-C))
 ret
 }

Te optimized version runs substantially faster and produces the same result, up to numerical precision error:

> a<-rnorm(100)
> b<-rnorm(100)
> c<-rnorm(100)
> A<-rnorm(200)
> B<-rnorm(200)
> C<-rnorm(200)
> system.time(r1<-f1(a,b,c,A,B,C))
   user  system elapsed 
  9.006   1.125  10.204 
> system.time(r2<-f2(a,b,c,A,B,C))
   user  system elapsed 
  0.203   0.033   0.242 
> max(abs(r1-r2))
[1] 1.364242e-12

If you have more than three sequences each, the same idea will work. It shouldn't be too hard to code up the general solution, in fact, it might well be possible to write the general solution with less lines of code than the hard-coded 3 sequence solution, although it will take a little thought to get all of the index manipulations just right.

On edit: OK, couldn't resist. Here's a general solution, with an arbitrary number of pairs, passed as the columns of two matrices:

f3<-function(a,A){
  subsets<-as.matrix(expand.grid(rep(list(c(F,T)),ncol(a))))
  ret<-array(0,rep(nrow(a),ncol(a)))

  for(i in 1:nrow(subsets)){
    sub<-as.logical(subsets[i,])
    temp<-Reduce(outer,as.list(data.frame(a[,sub,drop=F])),init=1)
    temp<-temp*sum(apply(A[,!sub,drop=F],1,prod))
    temp<-aperm(array(temp,dim(ret)),order(c(which(sub),which(!sub))))
    ret<-ret+temp*(-1)^(sum(!sub))
  } 
  ret
}

> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
  0.258   0.056   0.303 
> max(abs(r3-r1))
[1] 9.094947e-13

--------------------- Edit again (Nov 8 2013) ---------------------

The answer give above is the most efficient when the arrays A, B, and C are large. If A, B, and C are much smaller than a, b, and c, then here is an alternate, more concise solution:

f4<-function(a,A){
  ret<-array(0,rep(nrow(a),ncol(a)))
  for(i in 1:nrow(A)){
    temp<- Reduce(outer,as.list(data.frame(a-rep(A[i,],each=nrow(a)))),init=1)
    ret<-ret + as.vector(temp )
  }
  ret
}

In the above setup, where a, b, c, have length 100 and A, B, C have length 200, this is slower than the other solution:

> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
  0.704   0.092   0.256 
> system.time(r4<-f4(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
 65.824  19.060   3.553 
> max(abs(r3-r4))
[1] 2.728484e-12

However, if A, B, and C have length 1, then it is much faster:

> A<-rnorm(1)
> B<-rnorm(1)
> C<-rnorm(1)
> system.time(r3<-f3(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
  0.796   0.172   0.222 
> system.time(r4<-f4(cbind(a,b,c),cbind(A,B,C)))
   user  system elapsed 
  0.180   0.012   0.017 
> max(abs(r3-r4))
[1] 7.105427e-15

Upvotes: 4

Related Questions