Reputation: 179
I'm working on something which requires me to calculate the elements of a large square matrix repeatedly. The process involves reading data stored in another matrix, and then calculating the matrix elements. Presently I am using a double for loop to do this.
library(matrixcalc)
data <- matrix(nrow=3,ncol=1000)
for(x in 1:ncol(data)){
for(y in 1:ncol(data)){
matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
}
}
The problem is that this is extremely slow since my matrix is very large. What is the fastest alternative to this procedure?
Upvotes: 0
Views: 369
Reputation: 89097
Short and very fast:
mat <- exp(-as.matrix(dist(t(data))))
I would also recommend the fields::rdist
function as a faster alternative to dist
for computing the matrix of euclidean distances, so if loading the package is not an issue, consider:
library(fields)
mat <- exp(-rdist(t(data)))
To give you an idea of the speed improvement:
data <- matrix(runif(3000), nrow=3, ncol=1000)
OP <- function(data) {
require(matrixcalc)
mat <- matrix(0, ncol(data), ncol(data))
for(x in 1:ncol(data)){
for(y in 1:ncol(data)){
mat[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
}
}
mat
}
flodel1 <- function(data) exp(-as.matrix(dist(t(data))))
flodel2 <- function(data) {
require(fields)
exp(-rdist(t(data)))
}
system.time(res1 <- OP(data))
# user system elapsed
# 22.708 2.080 24.602
system.time(res2 <- flodel1(data))
# user system elapsed
# 0.112 0.025 0.136
system.time(res3 <- flodel2(data))
# user system elapsed
# 0.048 0.000 0.049
(Note that in the case of OP
and flodel2
, these runtimes do not include the loading of the packages as they had been loaded prior to the tests.)
Upvotes: 3
Reputation: 132989
You can use colSums
instead of the inner loop. Based on the answer by @Sven Hohenstein:
nc <- ncol(data)
mat <- diag(nc)
for(x in 2:nc){
mat[x, 1:x] <- exp(-(colSums((data[ , 1:x] - data[ ,x])^2) ^ .5))
}
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
Upvotes: 1
Reputation: 81753
This should be considerably faster:
nc <- ncol(data)
mat <- diag(nc)
for(x in 2:nc){
for(y in 1:x){
mat[x, y] <- exp(-(sum((data[ , x] - data[ , y])^2) ^ .5))
}
}
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
Upvotes: 2
Reputation: 11926
R language uses column-major-order arrays. Changing the for loops order can give a boost in performance. Because this way, you access the memory in a more contiguous shape, enabling cpu-cache advantages.
for(y in 1:dim) //outer is y now
{
for(x in 1:dim) //now x is count inside
{
matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
}
}
Your "matrix" is 2D-array right?
If you need even more speed, you can unroll some of inner loop to have less branching load for cpu and better caching/prefetching.
for(y in 1:dim)
{
for(x in 1:(dim/8)) //lets imagine dimension is a multiple of 8
{
matrix[x,y]=exp(-entrywise.norm(data[,x]-data[,y],2))
matrix[x+1,y]=exp(-entrywise.norm(data[,x+1]-data[,y],2))
matrix[x+2,y]=exp(-entrywise.norm(data[,x+2]-data[,y],2))
matrix[x+3,y]=exp(-entrywise.norm(data[,x+3]-data[,y],2))
matrix[x+4,y]=exp(-entrywise.norm(data[,x+4]-data[,y],2))
matrix[x+5,y]=exp(-entrywise.norm(data[,x+5]-data[,y],2))
matrix[x+6,y]=exp(-entrywise.norm(data[,x+6]-data[,y],2))
matrix[x+7,y]=exp(-entrywise.norm(data[,x+7]-data[,y],2))
}
}
Upvotes: 1