Performance of runif

Question

I am working on a custom bootstrap algorithm for a specific problem, and as I want a large number of replicates I do care about performance. In this regard, I have some questions on how to use runif properly. I'm aware that I could run benchmarks myself, but C++ optimization tends to be difficult and I would also like to understand the reasons for any difference.

First question:

Is the first code block faster than the second?

for (int i = 0; i < n_boot; i++) {
  new_random = runif(n);  //new_random is pre-allocated in class
  // do something with the random numbers
}

for (int i = 0; i < n_boot; i++) {
  NumericVector new_random = runif(n);
  // do something with the random numbers
}

It probably comes down to whether runif fills the left side or if it allocates and passes a new NumericVector.

Second question:

If both versions allocate a new vector, can I improve things by generating one random number at a time in scalar mode?

In case you are wondering, memory allocation takes up a sizable part of my processing time. I have reduced runtime by 30% by optimizing other unnecessary memory allocations away, so it does matter.

nrussell · Accepted Answer

I set up the following struct to try to represent your scenario accurately & facilitate the benchmarking:

#include 
// [[Rcpp::plugins(cpp11)]]

struct runif_test {

  size_t runs;
  size_t each;

  runif_test(size_t runs, size_t each)
  : runs(runs), each(each)
  {}
  // Your first code block
  void pre_init() {
    Rcpp::NumericVector v = no_init();
    for (size_t i = 0; i < runs; i++) {
      v = Rcpp::runif(each);
    }
  }
  // Your second code block
  void post_init() {
    for (size_t i = 0; i < runs; i++) {
      Rcpp::NumericVector v = Rcpp::runif(each);
    }
  }
  // Generate 1 draw at a time  
  void gen_runif() {
    Rcpp::NumericVector v = no_init();
    for (size_t i = 0; i < runs; i++) {
      std::generate_n(v.begin(), each, []() -> double {
        return Rcpp::as(Rcpp::runif(1));
      });
    }
  }
  // Reduce overhead of pre-allocated vector
  inline Rcpp::NumericVector no_init() {
    return Rcpp::NumericVector(Rcpp::no_init_vector(each));
  } 
};

where I benchmarked the following exported functions:

// [[Rcpp::export]]
void do_pre(size_t runs, size_t each) {
  runif_test obj(runs, each);
  obj.pre_init();
}

// [[Rcpp::export]]
void do_post(size_t runs, size_t each) {
  runif_test obj(runs, each);
  obj.post_init();
}

// [[Rcpp::export]]
void do_gen(size_t runs, size_t each) {
  runif_test obj(runs, each);
  obj.gen_runif();
}

Here are the results I got:

R>  microbenchmark::microbenchmark(
    do_pre(100, 10e4)
    ,do_post(100, 10e4)
    ,do_gen(100, 10e4)
    ,times=100L)
Unit: milliseconds
                 expr      min       lq      mean   median        uq       max neval
  do_pre(100, 100000) 109.9187 125.0477  145.9918 136.3749  152.9609  337.6143   100
 do_post(100, 100000) 103.1705 117.1109  132.9389 130.4482  142.7319  204.0951   100
  do_gen(100, 100000) 810.5234 911.3586 1005.9438 986.8348 1062.7715 1501.2933   100

R>  microbenchmark::microbenchmark(
    do_pre(100, 10e5)
    ,do_post(100, 10e5)
    ,times=100L)
Unit: seconds
                  expr      min       lq     mean   median       uq      max neval
  do_pre(100, 1000000) 1.355160 1.614972 1.740807 1.723704 1.815953 2.408465   100
 do_post(100, 1000000) 1.198667 1.342794 1.443391 1.429150 1.519976 2.042511   100

So, assuming I interpreted / accurately represented your second question,

If both versions allocate a new vector, can I improve things by generating one random number at a time in scalar mode?

with my gen_runif() member function, I think we can confidently say that this is not the optimal approach - ~ 7.5x slower than the other two functions.

More importantly, to address your first question, it seems that it is a little faster to just initialize & assign a new NumericVector to the output of Rcpp::runif(n). I'm certainly no C++ expert, but I believe the second method (assigning to a new local object) was faster than the first because of copy elision. In the second case, it looks as though two objects are being created - the object on the left of the =, v, and a (temporary? rvalue?) object on the right side of the =, which is the result of Rcpp::runif(). In reality though, the compiler will most likely optimize this unnecessary step out - which I think is explained in this passage from the article I linked:

When a nameless temporary, not bound to any references, would be moved or copied into an object of the same type ... the copy/move is omitted. When that temporary is constructed, it is constructed directly in the storage where it would otherwise be moved or copied to.

This was, at least, how I interpreted the results. Hopefully someone who is more well-versed in the language can confirm / deny / correct this conclusion.

Performance of runif

Answers (2)

Related Questions