Comp_Warrior
Comp_Warrior

Reputation: 179

Fastest way to calculate matrix elements

I'm working on something which requires me to calculate the elements of a large square matrix repeatedly. The process involves reading data stored in another matrix, and then calculating the matrix elements. Presently I am using a double for loop to do this.

library(matrixcalc)

data <- matrix(nrow=3,ncol=1000)

for(x in 1:ncol(data)){
   for(y in 1:ncol(data)){
       matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
   }
}

The problem is that this is extremely slow since my matrix is very large. What is the fastest alternative to this procedure?

Upvotes: 0

Views: 369

Answers (4)

flodel
flodel

Reputation: 89097

Short and very fast:

mat <- exp(-as.matrix(dist(t(data))))

I would also recommend the fields::rdist function as a faster alternative to dist for computing the matrix of euclidean distances, so if loading the package is not an issue, consider:

library(fields)
mat <- exp(-rdist(t(data)))

To give you an idea of the speed improvement:

data <- matrix(runif(3000), nrow=3, ncol=1000)

OP <- function(data) {
  require(matrixcalc)
  mat <- matrix(0, ncol(data), ncol(data))
  for(x in 1:ncol(data)){
    for(y in 1:ncol(data)){
      mat[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
    }
  }
  mat
}

flodel1 <- function(data) exp(-as.matrix(dist(t(data))))
flodel2 <- function(data) {
  require(fields)
  exp(-rdist(t(data)))
}

system.time(res1 <- OP(data))
#   user  system elapsed 
# 22.708   2.080  24.602 
system.time(res2 <- flodel1(data))
#   user  system elapsed 
#  0.112   0.025   0.136 
system.time(res3 <- flodel2(data))
#   user  system elapsed 
#  0.048   0.000   0.049 

(Note that in the case of OP and flodel2, these runtimes do not include the loading of the packages as they had been loaded prior to the tests.)

Upvotes: 3

Roland
Roland

Reputation: 132989

You can use colSums instead of the inner loop. Based on the answer by @Sven Hohenstein:

nc <- ncol(data)

mat <- diag(nc)

for(x in 2:nc){
  mat[x, 1:x] <- exp(-(colSums((data[ , 1:x] - data[ ,x])^2) ^ .5))
}

mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81753

This should be considerably faster:

nc <- ncol(data)

mat <- diag(nc)

for(x in 2:nc){
   for(y in 1:x){
       mat[x, y] <- exp(-(sum((data[ , x] - data[ , y])^2) ^ .5))
   }
}

mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]

Upvotes: 2

huseyin tugrul buyukisik
huseyin tugrul buyukisik

Reputation: 11926

R language uses column-major-order arrays. Changing the for loops order can give a boost in performance. Because this way, you access the memory in a more contiguous shape, enabling cpu-cache advantages.

 for(y in 1:dim) //outer is y now
 {
    for(x in 1:dim) //now x is count inside
    {
        matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
    }
 }

Your "matrix" is 2D-array right?

If you need even more speed, you can unroll some of inner loop to have less branching load for cpu and better caching/prefetching.

 for(y in 1:dim) 
 {
    for(x in 1:(dim/8)) //lets imagine dimension is a multiple of 8
    {
        matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
        matrix[x+1,y]=exp(-entrywise.norm(data[,x+1]-data[,y],2))
        matrix[x+2,y]=exp(-entrywise.norm(data[,x+2]-data[,y],2))
        matrix[x+3,y]=exp(-entrywise.norm(data[,x+3]-data[,y],2))
        matrix[x+4,y]=exp(-entrywise.norm(data[,x+4]-data[,y],2))
        matrix[x+5,y]=exp(-entrywise.norm(data[,x+5]-data[,y],2))
        matrix[x+6,y]=exp(-entrywise.norm(data[,x+6]-data[,y],2))
        matrix[x+7,y]=exp(-entrywise.norm(data[,x+7]-data[,y],2))
    }
 }

Upvotes: 1

Related Questions