Reputation: 21
I'm relatively new to R programming, and this website has been very helpful for me so far, but I was unable to find a question that already covered what I want to know. So I decided to post a question myself.
My problem is the following: I want to find efficient ways to compute cumulative sums over four-dimensional arrays, i.e. I have data in a four-dimensional array x and want to write a function that computes an array x_sum such that
x_sum[i,j,k,l] = sum_{ind1 <= i, ind2 <= j, ind3 <= k, ind4 <=l} x[ind1, ind2, ind3, ind4].
I want to use this function billions of times, which makes it very important that it be as efficient as possible. Although I have come up with several ways to calculate the sums (see below), I suspect more experienced R programmers might be able to find a more efficient solution. So if anyone can suggest a better way of doing this, I would be very grateful.
Here's what I've tried so far:
I have found three different implementations (each of which brought a gain in speed) that work (see code below): One in R using the cumsum() function (cumsum_4R) and two implementations where the „heavy lifting“ is done in C (using the .C() interface). The first implementation in C is merely a naive attempt to write the sums using nested for-loops and pointer arithmetic (cumsumC_4_old). In the second C-implementation (cumsumC_4) I tried to adapt my code using the ideas in the following article
As you can see in the source code below, the adaptation is relatively lopsided: For some dimensions, I was able to replace all the nested for-loops but not for others. Do you have ideas on how to do that?
Using microbenchmark on the three implementations, I get the following result for arrays of size 40x40x40x40:
Unit: milliseconds
expr min lq mean median uq
cumsum_4R(x) 976.13258 1029.33371 1064.35100 1051.37782 1074.23234
cumsumC_4_old(x) 174.72868 177.95875 192.75392 184.11121 203.18141
cumsumC_4(x) 56.87169 57.73512 67.34714 63.20269 68.80326
max neval
1381.5832 50
283.2384 50
105.7099 50
Additional information: 1) Since this made it easier to install any needed packages, I ran the benchmarks on my personal computer under Windows, but I plan on running the finished simulations on a computer from my university, which runs under Linux.
EDIT: 2) The four-dimensional data x[i,j,k,l] is actually the result of the product of two applications of the outer function: First, the outer product of a matrix with itself (i.e. outer(mat,mat)) and then taking the pairwise minima of another matrix (i.e. outer(mat2, mat2, pmin)). Then the data is the product
x = outer(mat, mat) * outer(mat2, mat2, pmin),
i.e.
x[i,j,k,l] = mat[i,j] * mat[k,l] * min(mat2[i,j], mat2[k,l])
The four-dimensional array has the corresponding symmetries.
3)The reason I need these cumulative sums in the first place is that I want to run simulations of a test for which I need partial sums over „rectangles“ of indices: I want to iterate over all sums of the form
sum_{k1<=i1 <= m1,k2<=i2 <= m2, k1 <= i3 <= m1, k2 <= i4 <=m2} x[i1, i2, i3, i4],
where 1<=k1<=m1<=n, 1<=k2<=m2<=n. In order to avoid calculating the sum of the same variables over and over again, I first calculate all the cumulative sums and then calculate the sums over rectangles as functions of the cumulative sums. Do you know of a more efficient way to do this? EDIT to 3): In order to include all potentially important aspects: I also want to calculate sums of the form
sum_{k1<=i1 <= m1,k2<=i2 <= m2, 1 <= i3 <= n, 1 <= i4 <=n} x[i1, i2, i3, i4].
(Since I can trivially obtain them using the cumulative sums, I had not included this specification before).
Here is the C code I use (which I save as „cumsumC.c“):
#include<R.h>
#include<math.h>
#include <stdio.h>
int min(int a, int b){
if(a <= b) return a;
else return b;
}
void cumsumC_4_old(double* x, int* nv){
int n = *nv;
int n2 = n*n;
int n3 = n*n*n;
//Dim 1
for(int i=0; i<n; i++){
for(int j=0; j<n; j++){
for(int k=0; k<n; k++){
for(int l=1; l<n; l++){
x[i+j*n+k*n2+l*n3] += x[i + j*n +k*n2+(l-1)*n3];
}
}
}
}
//Dim 2
for(int i=0; i<n; i++){
for(int j=0; j<n; j++){
for(int k=1; k<n; k++){
for(int l=0; l<n; l++){
x[i+j*n+k*n2+l*n3] += x[i + j*n +(k-1)*n2+l*n3];
}
}
}
}
//Dim 3
for(int i=0; i<n; i++){
for(int j=1; j<n; j++){
for(int k=0; k<n; k++){
for(int l=0; l<n; l++){
x[i+j*n+k*n2+l*n3] += x[i + (j-1)*n +k*n2+l*n3];
}
}
}
}
//Dim 4
for(int i=1; i<n; i++){
for(int j=0; j<n; j++){
for(int k=0; k<n; k++){
for(int l=0; l<n; l++){
x[i+j*n+k*n2+l*n3] += x[i-1 + j*n +k*n2+l*n3];
}
}
}
}
}
void cumsumC_4(double* x, int* nv){
int n = *nv;
int n2 = n*n;
int n3 = n*n*n;
long ind1, ind2;
long index, indexges = n +(n-1)*n+(n-1)*n2+(n-1)*n3, indexend;
//Dim 1
index = n3;
while(index != indexges){
x[index] += x[index-n3];
index++;
}
//Dim 2
long teilind = n+(n-1)*n;
for(int k=1; k<n; k++){
ind1 = k*n2;
ind2 = ind1 - n2;
for(int l=0; l<n; l++){
index = l*n3;
indexend = teilind+index;
while(index != indexend){
x[index+ind1] += x[index+ind2];
index++;
}
}
}
//Dim 3
ind1 = n;
while(ind1 < n+(n-1)*n){
index = 0;
indexend = indexges - ind1;
ind2 = ind1-n;
while(index < indexend){
x[ind1+index] += x[ind2+index];
index += n2;
}
ind1++;
}
//Dim 4
index = 0;
int i;
long minind;
while(index < indexges){
i = 1;
minind = min(indexges, index+n);
while(index+i < minind){
x[index+i] += x[index+i-1];
i++;
}
index+=n;
}
}
Here is the R function „cumsum_4R“ and code used to call and compare the cumulative sum functions in R (under Windows; for Linux, the commands dyn.load/dyn.unload need to be adjusted; ideally, I want to use the functions on 50^4 size arrays, but since the call to microbenchmark would then take a while, I have chosen n=40 here):
library("microbenchmark")
# dyn.load("cumsumC.so")
dyn.load("cumsumC.dll")
cumsum_4R <- function(x){
return(aperm(apply(apply(aperm(apply(apply(x, 2:4,function(a) cumsum(as.numeric(a))), c(1,3,4) , function(a) cumsum(as.numeric(a))), c(2,1,3,4)), c(1,2,4), function(a) cumsum(as.numeric(a))), 1:3, function(a) cumsum(as.numeric(a))), c(3,4,2,1)))
}
cumsumC_4_old <- function(x){
n <- dim(x)[1]
arr <- array(.C("cumsumC_4_old", res=as.double(x), as.integer(n))$res, dim=c(n,n,n,n))
return(arr)
}
cumsumC_4 <- function(x){
n <- dim(x)[1]
arr <- array(.C("cumsumC_4", res=as.double(x), as.integer(n))$res, dim=c(n,n,n,n))
return(arr)
}
set.seed(1234)
n <- 40
x <- array(rnorm(n^4),dim=c(n,n,n,n))
r <- 6 #parameter for rounding results for comparison
res1 <- cumsum_4R(x)
res2 <- cumsumC_4_old(x)
res3 <- cumsumC_4(x)
print(c("Identical R and C1:", identical(round(res1,r),round(res2,r))))
print(c("Identical R and C2:",identical(round(res1,r),round(res3,r))))
times <- microbenchmark(cumsum_4R(x), cumsumC_4_old(x),cumsumC_4(x),times=50)
print(times)
dyn.unload("cumsumC.dll")
# dyn.unload("cumsumC.so")
Thank you for your help!
Upvotes: 2
Views: 383
Reputation: 4474
You can indeed use points 2 and 3 in your original question to solve the problem more efficiently. Actually, this makes the problem separable. By separable I mean that the limits of the 4 sums in Equation 3 do not depend on the variables you sum over. This and the fact that x
is an outer product of 2 matrices enables you to separate the 4-fold sum in Eq. 3 into an outer product of two 2-fold sums. Even better: the 2 matrices used to define x
are the same (denoted by mat
by you) - so the two 2-fold sums give the same matrix, which has to be calculated only once.
Here the code:
set.seed(1234)
n=40
mat=array(rnorm(n^2),dim=c(n,n))
x=outer(mat,mat)
cumsum_sep=function(x) {
#calculate matrix corresponding to 2-fold sums
#actually it's just one matrix because x is an outer product of mat with itself
tmp=t(apply(apply(x,2,cumsum),1,cumsum))
#outer product of two-fold sums
outer(tmp,tmp)
}
y1=cumsum_4R(x)
#note that cumsum_sep operates on the original matrix mat!
y2=cumsum_sep(mat)
Check whether results are the same
all.equal(y1,y2)
[1] TRUE
This gives the benchmark results
microbenchmark(cumsum_4R(x),cumsum_sep(mat),times=10)
Unit: milliseconds
expr min lq mean median uq max neval cld
cumsum_4R(xx) 2084.454155 2135.852305 2226.59692 2251.95928 2270.15198 2402.2724 10 b
cumsum_sep(x) 6.844939 7.145546 32.75852 14.45762 34.94397 120.0846 10 a
Quite a difference! :)
Upvotes: 1