Reputation: 735
Recently I stumpled upon the Rcpp package and saw it's massive speed improvements especially on for loops. I am programming a simulation that constantly does matrix algebra on 5D-arrays. An example looks like this:
library(rbenchmark)
Mat<- assign("Mat",array(10,c(10,10,9,4,9)))
Mat2<-matrix(c(runif(16,0,1)),nrow=4,ncol=4)
FuncR<-function(x){
for(i in 1:dim(Mat)[1]){
for(j in 1:dim(Mat)[2]){
for(loc in 1:dim(Mat)[3]){
for(Yr in 1:dim(Mat)[5]){
Mat[i,j,loc,,Yr]<<-floor(x%*%Mat[i,j,loc,,Yr])
}
}
}
}
}
benchmark(FuncR(Mat2))
test replications elapsed relative user.self sys.self user.child sys.child
1 FuncR(Mat2) 100 17.008
1 7.228 0.016 0 0
Now I was briefly reading some introduction on how to use the Rcpp package and before I dive deeper I wanted to know whether in cases like the described above I can really gain something. I am asking because for me as a non programmer it would mean some time investment so I want to make sure it is worth. The rational behind the complex arrays is that every step of my script does a matrix multiplication over a different dimension of the array to simulate a development with time for various related things. I am open for improvements and am not sticking to a particular way of setting it up. However I also failed getting improvements using vectorization
Upvotes: 1
Views: 120
Reputation: 9687
Alternatively, you might consider the tensor
library:
> require(tensor)
> MT <- tensor(Mat2, Mat, 2, 4)
> dim(MT)
[1] 4 10 10 9 9
> MT[,1,1,1,1]
[1] 6.357168721 18.288843057 21.215756948 10.310288982
Upvotes: 1
Reputation: 9687
You should get similar performance to Rcpp just by using matrix multiply a bit more directly.
Use aperm
to bring the 4th dimension to the front, then flatten to 2 dimensions. You can do one %*%
and then reverse the process.
> Mat3 <- aperm(Mat, c(4,1,2,3,5))
> dim(Mat3)
[1] 4 10 10 9 9
> dim(Mat3) <- c(4,prod(10,10,9,9))
> Mat4 <- Mat2 %*% Mat3
> dim(Mat4)
[1] 4 8100
> dim(Mat4) <- c(4,10,10,9,9)
> Mat5 <- aperm(Mat4, c(2,3,4,1,5))
To double check:
# From your inner loop
> Mat2 %*% Mat[1,1,1,,1]
[,1]
[1,] 6.357168721
[2,] 18.288843057
[3,] 21.215756948
[4,] 10.310288982
> Mat5[1,1,1,,1]
[1] 6.357168721 18.288843057 21.215756948 10.310288982
LGTM.
I've seen this pattern quite a bit in "scholarly" code, not so much in industry. It does trade off some memory for shortness of code, because the multiply is no longer in place.
If you have control of other parts of the code base, you might consider changing the memory layout of your Mat
object and then the aperms
won't be necessary
Upvotes: 2
Reputation: 13304
I feel much more comfortable working with data frames rather than slices in 5D-space, so here is my approach to this problem:
library(dplyr)
melt(Mat) %>%
setNames(c('i','j','loc','k','Yr','value')) %>%
group_by(i,j,loc,Yr) %>%
mutate(out=floor(Mat2 %*% value))
Of course, similar solution can be obtained using data.table
. Both dplyr
and data.table
solutions are expected to be faster than your code.
Upvotes: 1