Hello
Hello

Reputation: 181

Returning bunch of matrices using RCPP in C++ in an efficient way using a list

I am trying to return a bunch of matrices using RCPP. My code below is extremely inefficient. I would like to know if the following code can be efficient.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::List hello( 
    const arma::rowvec& g,
    const int& n, 
    const int& p,
    const arma::mat& S,
    const arma::mat& zc,
    const arma::rowvec& dl){
  Rcpp::List ht(n);

  for(int t=0; t < n;++t){

    arma::mat hhat(p,n);
    hhat.fill(0.0);
    for(int i = 0;i < n; ++i){
      arma::mat h(p,1);
      h.fill(0.0);
      if (t > i){
        for(int u=i;u <= t; ++u){
          arma::rowvec zr = zc.rows(i,i);
          h += exp(arma::as_scalar(g*zr.t())) * (zr.t() - S.cols(u,u))*dl(u);
        }
      }
      hhat.cols(i,i) = h;
    }
    ht[t] = hhat;
  }

  // Specify list length
  Rcpp::List res(1);
  res[0] = ht;

  return(res);
}

Here is the example.

g=c(1,2.1,3.1)
n=1600
p=3
S = matrix(rnorm(4800),nrow=3,ncol=1600)
dl=runif(1600)
z=matrix(runif(4800),nrow=1600,ncol=3)
ptm=proc.time();kkk= hello(g=g,n=n,p=p,S = S,zc=z,dl = dl);proc.time()-ptm;
 user  system elapsed 
  31.25    0.00   31.30 

Any help would be appreciated.

Following the updated code. Initially I was returning list of a list. Now it returns a list. This reduces the computing time by 10 seconds. I hope this code can be improved further.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::List hello( 
    const arma::rowvec& g,
    const int& n, 
    const int& p,
    const arma::mat& S,
    const arma::mat& zc,
    const arma::rowvec& dl){
  Rcpp::List ht(n);


  for(int t=0; t < n;++t){

    arma::mat hhat(p,n);
    hhat.zeros();
    for(int i = 0;i < n; ++i){
      arma::mat h(p,1);
      // h.fill(0.0);
      h.zeros();
      if (t > i){
        for(int u=i;u <= t; ++u){
          //arma::rowvec zr = zc.rows(i,i);
          h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(u))*dl(u);
        }
      }
      hhat.col(i) = h;
    }
    ht[t] = hhat;
  }

  // Specify list length
  // Rcpp::List res(1);
  // res[0] = ht;

  return(ht);
}

The formula that I am trying to implement is given below.enter image description here

Upvotes: 1

Views: 440

Answers (2)

Ralf Stubner
Ralf Stubner

Reputation: 26833

In my other answer I looked at the efficiency of returning data and at simple optimizations. Here I want to look at something different: Optimization of the algorithm.

You want to compute hhat(i, t) for 0 <= i, t < n and i < t. Looking at your formula we see that the dependency of hhat on i and t is very different. In particular, hhat(i, t + 1) can be written as hhat(i, t) + something. Right now your outer loop is over t and you are recomputing all these intermediate values. By switching the loop order, it is easy to do each such computation only once, bringing the algorithm down to a two nested loops. This means you have to generate the resulting matrices separately. And since you cannot store an arma::mat inside a Rcpp::List, I need an additional std::vector for storage:

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
Rcpp::List hello( 
        const arma::rowvec& g,
        const int& n, 
        const int& p,
        const arma::mat& S,
        const arma::mat& zc,
        const arma::rowvec& dl){

    std::vector<arma::mat> foo(n);
    for(int t=0; t < n;++t){
        arma::mat hhat(p,n);
        hhat.zeros();
        foo[t] = hhat;
    }

    for(int i = 0;i < n; ++i){
        arma::mat h = exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(i))*dl(i);
        for(int t=i+1; t < n;++t){
            h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(t))*dl(t);
            foo[t].col(i) = h;
        }
    }

    Rcpp::List ht(n);
    for(int t=0; t < n;++t){
        ht[t] = foo[t];
    }

    return(ht);
}

// [[Rcpp::export]]
Rcpp::List hello_orig( 
        const arma::rowvec& g,
        const int& n, 
        const int& p,
        const arma::mat& S,
        const arma::mat& zc,
        const arma::rowvec& dl){
    Rcpp::List ht(n);


    for(int t=0; t < n;++t){

        arma::mat hhat(p,n);
        hhat.zeros();
        for(int i = 0;i < n; ++i){
            arma::mat h(p,1);
            h.zeros();
            if (t > i){
                for(int u=i;u <= t; ++u){
                    h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(u))*dl(u);
                }
            }
            hhat.col(i) = h;
        }
        ht[t] = hhat;
    }

    return(ht);
}

/***R
g=c(1,2.1,3.1)
n=1600
p=3
S = matrix(rnorm(p*n),nrow=p,ncol=n)
dl=runif(n)
z=matrix(runif(p*n),nrow=n,ncol=p)
bench::mark(hello_orig(g=g,n=n,p=p,S = S,zc=z,dl = dl),
            hello(g=g,n=n,p=p,S = S,zc=z,dl = dl))
*/

Result:

# A tibble: 2 x 13
  expression                                                 min median `itr/sec` mem_alloc
  <bch:expr>                                              <bch:> <bch:>     <dbl> <bch:byt>
1 hello_orig(g = g, n = n, p = p, S = S, zc = z, dl = dl)  14.2s  14.2s    0.0703    58.7MB
2 hello(g = g, n = n, p = p, S = S, zc = z, dl = dl)      53.9ms 85.9ms   11.1       58.7MB
# … with 8 more variables: `gc/sec` <dbl>, n_itr <int>, n_gc <dbl>, total_time <bch:tm>,
#   result <list>, memory <list>, time <list>, gc <list>

More than a factor 100 faster!

You can get cleaner (and maybe even a bit faster code) by floowing @coatless' suggestions in the comments to use an arma::cube. The most compact form will give you a different return structure, though. Instead of a list of p x n you will get a p x n x n array:

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
arma::cube coatless( 
        const arma::rowvec& g,
        const int& n, 
        const int& p,
        const arma::mat& S,
        const arma::mat& zc,
        const arma::rowvec& dl){

    arma::cube ht(p, n, n);
    ht.zeros();

    for(int i = 0;i < n; ++i){
        arma::mat h = exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(i))*dl(i);
        for(int t=i+1; t < n;++t){
            h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(t))*dl(t);
            ht.slice(t).col(i) = h;
        }
    }

    return(ht);
}

Upvotes: 4

Ralf Stubner
Ralf Stubner

Reputation: 26833

Your question title makes one think you see the problem in returning the data to R. Rest assured that this is not an issue. You can easily check this by calling a function that returns matrices of zeros in the required size:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::List minimal( 
        const arma::rowvec& g,
        const int& n, 
        const int& p,
        const arma::mat& S,
        const arma::mat& zc,
        const arma::rowvec& dl){
    Rcpp::List ht(n);

    for(int t=0; t < n;++t){

        arma::mat hhat(p,n);
        hhat.zeros();
        ht[t] = hhat;
    }

    return(ht);
}

On my system this function takes about 0.01 s with your input data. In other words, your real function spends most of its time on computing the actual results.

As for optimizing that part, it would be helpful if you could provide an idea of what you are trying to implement, e.g. with the help of mathematical formulas. As it stands, I can only do some simple changes:

  • In the i loop you only do something for t > i. Therefore it is sufficient to let the loop run till i < t.

  • The u loop can be formulated as a matrix-vector product, for which efficient implementations exist.

With changes like this I end up with

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::List hello( 
        const arma::rowvec& g,
        const int& n, 
        const int& p,
        const arma::mat& S,
        const arma::mat& zc,
        const arma::rowvec& dl){
    Rcpp::List ht(n);

    for(int t=0; t < n;++t){

        arma::mat hhat(p,n);
        hhat.zeros();
        for(int i = 0;i < t; ++i){
            arma::mat Sit = S.cols(i,t);
            hhat.col(i) = - exp(arma::as_scalar(g*zc.row(i).t())) * 
                (Sit.each_col() - zc.row(i).t()) * dl.subvec(i,t).t();
        }
        ht[t] = hhat;
    }

    return(ht);
}

On my system this is about a factor of two faster than your code. It might well be possible to get even faster, though.

Upvotes: 2

Related Questions