nalzok
nalzok

Reputation: 16157

Generating a random bit stream in Rcpp efficiently

I have an auxiliary function in the R package I'm currently building named rbinom01. Note that it calls random(3).

int rbinom01(int size) {
  if (!size) {
    return 0;
  }

  int64_t result = 0;
  while (size >= 32) {
    result += __builtin_popcount(random());
    size -= 32;
  }

  result += __builtin_popcount(random() & ~(LONG_MAX << size));

  return result;
}

When R CMD check my_package, I got the following warning:

* checking compiled code ... NOTE
File ‘ my_package/libs/my_package.so’:
  Found ‘_random’, possibly from ‘random’ (C)
    Object: ‘ my_function.o’

Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual.

I headed to the Document, and it says I can use one of the *_rand function, along with a family of distribution functions. Well that's cool, but my package simply needs a stream of random bits rather than a random double. The easiest way I can have it is by using random(3) or maybe reading from /dev/urandom, but that makes my package "unportable".

This post suggests using sample, but unfortunately it doesn't fit into my use case. For my application, generating random bits is apparently critical to the performance, so I don't want it waste any time calling unif_rand, multiply the result by N and round it. Anyway, the reason I'm using C++ is to exploit bit-level parallelism.

Surely I can hand-roll my own PRNG or copy and paste the code of a state-of-the-art PRNG like xoshiro256**, but before doing that I would like to see if there are any easier alternatives.

Incidentally, could someone please link a nice short tutorial of Rcpp to me? Writing R Extensions is comprehensive and awesome but it would take me weeks to finish. I'm looking for a more concise version, but preferably it should be more informative than a call to Rcpp.package.skeleton.


As suggested by @Ralf Stubner's answer, I have re-wrote the original code as follow. However, I'm getting the same result every time. How can I seed it properly and at the same time keep my code "portable"?

int rbinom01(int size) {
  dqrng::xoshiro256plus rng;

  if (!size) {
    return 0;
  }

  int result = 0;
  while (size >= 64) {
    result += __builtin_popcountll(rng());
    Rcout << sizeof(rng()) << std::endl;
    size -= 64;
  }

  result += __builtin_popcountll(rng() & ((1LLU << size) - 1));

  return result;
}

Upvotes: 1

Views: 171

Answers (1)

Ralf Stubner
Ralf Stubner

Reputation: 26843

There are different R packages that make PRNGs available as C++ header only libraries:

  • BH: Everything from boost.random
  • sitmo: Various Threefry versions
  • dqrng: PCG family, xoshiro256+ and xoroshiro128+
  • ...

You can make use of any of these by adding LinkingTo to your package's DECRIPTION. Typically these PRNGs are modeled after the C++11 random header, which means you have to control their life-cycle and seeding yourself. In a single-threaded environment I like to use anonymous namespaces for life-cycle control, e.g.:

#include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]

namespace {
dqrng::xoshiro256plus rng{};
}

// [[Rcpp::export]]
void set_seed(int seed) {
  rng.seed(seed);
}

// [[Rcpp::export]]
int rbinom01(int size) {
  if (!size) {
    return 0;
  }

  int result = 0;
  while (size >= 64) {
    result += __builtin_popcountll(rng());
    size -= 64;
  }

  result += __builtin_popcountll(rng() & ((1LLU << size) - 1));

  return result;
}

/*** R
set_seed(42)
rbinom01(10)
rbinom01(10)
rbinom01(10)
*/

However, using runif isn't all bad and certainly faster than accessing /dev/urandom. In dqrng there is a convenient wrapper for this.

As for tutorials: Besides WRE the Rcpp package vignette is a must read. R Packages by Hadley Wickham also has a chapter on "compiled code" if you want to go the devtools-way.

Upvotes: 4

Related Questions