Reputation: 11
I am trying to write high-performance code that uses random numbers, using Mersenne Twister. It takes roughly ~5ns
to generate a random unsigned long long
. This is used to generate a double
, however, these take ~40ns
to generate in a distribution.
Viewing the STL code the double
s, generated by a distribution, are generated by calls to std::generate_canonical
, which involves a std::ceil
and std::log2
operation, I believe it is these that are costly.
These operations are unnecessary as they are used to calculate the number of bits needed for calls to any RNG implementation. As this is known before compile time, I have written my own implementation that does not make these calls, and the time to generate a double
is ~15ns
.
Is it possible to specialize a templated STL function? If so how is this achieved, my attempts so far result in the original function still being used. I would like to specialize this STL function as I would still like to use the distributions in <random>
.
This is in Visual C++, though once the code has been developed it will be run on Linux and use either GCC or ICC. If the method for generating doubles on Linux is different, (and quicker), this problem is irrelevant.
Edit 1:
I believe all distributions requiring a double make calls to std::generate_canonical
, this function creates a double in the range [0,1) and the correct precision is created by iteratively adding calls to the RNG operator()
. The log2
and ceil
are used to calculate the number of iterations.
MSVC std::generate_canonical
// FUNCTION TEMPLATE generate_canonical
template<class _Real,
size_t _Bits,
class _Gen>
_Real generate_canonical(_Gen& _Gx)
{ // build a floating-point value from random sequence
_RNG_REQUIRE_REALTYPE(generate_canonical, _Real);
const size_t _Digits = static_cast<size_t>(numeric_limits<_Real>::digits);
const size_t _Minbits = _Digits < _Bits ? _Digits : _Bits;
const _Real _Gxmin = static_cast<_Real>((_Gx.min)());
const _Real _Gxmax = static_cast<_Real>((_Gx.max)());
const _Real _Rx = (_Gxmax - _Gxmin) + static_cast<_Real>(1);
const int _Ceil = static_cast<int>(_STD ceil(
static_cast<_Real>(_Minbits) / _STD log2(_Rx)));
const int _Kx = _Ceil < 1 ? 1 : _Ceil;
_Real _Ans = static_cast<_Real>(0);
_Real _Factor = static_cast<_Real>(1);
for (int _Idx = 0; _Idx < _Kx; ++_Idx)
{ // add in another set of bits
_Ans += (static_cast<_Real>(_Gx()) - _Gxmin) * _Factor;
_Factor *= _Rx;
}
return (_Ans / _Factor);
}
My Simplified Version
template<size_t _Bits>
double generate_canonical(std::mt19937_64& _Gx)
{ // build a floating-point value from random sequence
const double _Gxmin = static_cast<double>((_Gx.min)());
const double _Gxmax = static_cast<double>((_Gx.max)());
const double _Rx = (_Gxmax - _Gxmin) + static_cast<double>(1);
double _Ans = (static_cast<double>(_Gx()) - _Gxmin);
return (_Ans / _Rx);
}
This function is written in namespace std {}
Edit 2:
I found a solution please see my answer below.
Upvotes: 1
Views: 429
Reputation: 11
By creating a template function with all parameters set, and declaring the functions as inline
, it is possible to create user defined version of std::generate_canonical
.
User defined std::generate_canonical
:
namespace std {
template<>
inline double generate_canonical<double, static_cast<size_t>(-1), std::mt19937>(std::mt19937& _Gx)
{ // build a floating-point value from random sequence
const double _Gxmin = static_cast<double>((_Gx.min)());
const double _Rx = (static_cast<double>((_Gx.max)()) - _Gxmin) + static_cast<double>(1);
double _Ans = (static_cast<double>(_Gx()) - _Gxmin);
_Ans += (static_cast<double>(_Gx()) - _Gxmin) *_Rx;
return (_Ans / _Rx * _Rx);
}
template<>
inline double generate_canonical<double, static_cast<size_t>(-1), std::mt19937_64>(std::mt19937_64& _Gx)
{ // build a floating-point value from random sequence
const double _Gxmin = static_cast<double>((_Gx.min)());
const double _Rx = (static_cast<double>((_Gx.max)()) - _Gxmin) + static_cast<double>(1);
return ((static_cast<double>(_Gx()) - _Gxmin) / _Rx);
}
}
The second parameter static_cast<size_t>(-1)
should be modified to whatever value is used by specific libraries, this is the case for VC++ but may be different for GCC. This means it is not portable.
This function has been defined for std::mt19337
and std::mt19937_64
and seems to be used for STL distributions correctly.
Results:
double using std::generate_canonical
Generating 400000000 doubles using standard MT took: 17625 milliseconds
This equivalent to: 44.0636 nanoseconds per value
Generating 400000000 doubles using 64bit MT took: 11958 milliseconds
This equivalent to: 29.8967 nanoseconds per value
double using new generate_canonical
Generating 400000000 doubles using standard MT took: 4843 milliseconds
This equivalent to: 12.1097 nanoseconds per value
Generating 400000000 doubles using 64bit MT took: 2645 milliseconds
This equivalent to: 6.61362 nanoseconds per value
Upvotes: 0
Reputation: 179991
Sorry, specializing Standard Library functions is not allowed; doing so results in Undefined Behavior.
However, you can use alternative distributions; C++ has well-defined interfaces between generators and distributions.
Oh, and just to eliminate the possibility of a beginners error (since you don't show code): you do not create a new distribution for every number.
Upvotes: 1