Reputation: 3621

Same random numbers in C++ as computed by Python3 numpy.random.rand

I would like to duplicate in C++ the testing for some code that has already been implemented in Python3 which relies on numpy.random.rand and randn values and a specific seed (e.g., seed = 1).

I understand that Python's random implementation is based on a Mersenne twister. The C++ standard library also supplies this in std::mersenne_twister_engine.

The C++ version returns an unsigned int, whereas Python rand is a floating point value.

Is there a way to obtain the same values in C++ as are generated in Python, and be sure that they are the same? And the same for an array generated by randn ?

Upvotes: 8

Answers (3)

Madeleine P. Vincent

Reputation: 3621

For completeness and to avoid re-inventing the wheel, here is an implementation for both numpy.rand and numpy.randn in C++

The header file:

#ifndef RANDOMNUMGEN_NUMPYCOMPATIBLE_H
#define RANDOMNUMGEN_NUMPYCOMPATIBLE_H

#include "RandomNumGenerator.h"
    
//Uniform distribution - numpy.rand
class RandomNumGen_NumpyCompatible {
public:
    RandomNumGen_NumpyCompatible();
    RandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);

    std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
    std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
    void seed(std::uint_fast32_t seed);
    void discard(unsigned long long);      // NOTE!!  Advances and discards twice as many values as passed in to keep tracking with Numpy order
    uint_fast32_t operator()();            //Simply returns the next Mersenne value from the engine
    double getDouble();                    //Calculates the next uniformly random double as numpy.rand does

    std::string getGeneratorType() const { return "RandomNumGen_NumpyCompatible"; }

private:
    std::mt19937 m_mersenneEngine;
};

///////////////////

//Gaussian distribution - numpy.randn
class GaussianRandomNumGen_NumpyCompatible {
public:
    GaussianRandomNumGen_NumpyCompatible();
    GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t newSeed);

    std::uint_fast32_t min() const { return m_mersenneEngine.min(); }
    std::uint_fast32_t max() const { return m_mersenneEngine.max(); }
    void seed(std::uint_fast32_t seed);
    void discard(unsigned long long);      // NOTE!!  Advances and discards twice as many values as passed in to keep tracking with Numpy order
    uint_fast32_t operator()();            //Simply returns the next Mersenne value from the engine
    double getDouble();                    //Calculates the next normally (Gaussian) distrubuted random double as numpy.randn does

    std::string getGeneratorType() const { return "GaussianRandomNumGen_NumpyCompatible"; }

private:
    bool m_haveNextVal;
    double m_nextVal;
    std::mt19937 m_mersenneEngine;
};

#endif

And the implementation:

#include "RandomNumGen_NumpyCompatible.h"

RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible()
{
}

RandomNumGen_NumpyCompatible::RandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_mersenneEngine(seed)
{
}

void RandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
    m_mersenneEngine.seed(newSeed);
}

void RandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
    //Advances and discards TWICE as many values to keep with Numpy order
    m_mersenneEngine.discard(2*z);
}

std::uint_fast32_t RandomNumGen_NumpyCompatible::operator()()
{
    return m_mersenneEngine();
}

double RandomNumGen_NumpyCompatible::getDouble()
{
    int a = m_mersenneEngine() >> 5;
    int b = m_mersenneEngine() >> 6;
    return (a * 67108864.0 + b) / 9007199254740992.0;
}

///////////////////

GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible()
: m_haveNextVal(false)
{
}

GaussianRandomNumGen_NumpyCompatible::GaussianRandomNumGen_NumpyCompatible(std::uint_fast32_t seed)
: m_haveNextVal(false), m_mersenneEngine(seed)
{
}

void GaussianRandomNumGen_NumpyCompatible::seed(std::uint_fast32_t newSeed)
{
    m_mersenneEngine.seed(newSeed);
}

void GaussianRandomNumGen_NumpyCompatible::discard(unsigned long long z)
{
    //Burn some CPU cyles here
    for (unsigned i = 0; i < z; ++i)
        getDouble();
}

std::uint_fast32_t GaussianRandomNumGen_NumpyCompatible::operator()()
{
    return m_mersenneEngine();
}

double GaussianRandomNumGen_NumpyCompatible::getDouble()
{
    if (m_haveNextVal) {
        m_haveNextVal = false;
        return m_nextVal;
    }

    double f, x1, x2, r2;
    do {
        int a1 = m_mersenneEngine() >> 5;
        int b1 = m_mersenneEngine() >> 6;
        int a2 = m_mersenneEngine() >> 5;
        int b2 = m_mersenneEngine() >> 6;
        x1 = 2.0 * ((a1 * 67108864.0 + b1) / 9007199254740992.0) - 1.0;
        x2 = 2.0 * ((a2 * 67108864.0 + b2) / 9007199254740992.0) - 1.0;
        r2 = x1 * x1 + x2 * x2;
    } while (r2 >= 1.0 || r2 == 0.0);

    /* Box-Muller transform */
    f = sqrt(-2.0 * log(r2) / r2);
    m_haveNextVal = true;
    m_nextVal = f * x1;
    return f * x2;
}

Upvotes: 2

Madeleine P. Vincent

Reputation: 3621

After doing a bit of testing, it does seem that the values are within a tolerance (see @fdermishin 's comment below) when the C++ unsigned int is divided by the maximum value for an unsigned int like this:

  #include <limits>
  ...
  std::mt19937 generator1(seed);  // mt19937 is a standard mersenne_twister_engine
  unsigned val1 = generator1();
  std::cout << "Gen 1 random value: " << val1 << std::endl;
  std::cout << "Normalized Gen 1: " << static_cast<double>(val1) /  std::numeric_limits<std::uint32_t>::max() << std::endl;

However, Python's version seems to skip every other value. Given the following two programs:

#!/usr/bin/env python3

import numpy as np

def main():

    np.random.seed(1)
    
    for i in range(0, 10):
        print(np.random.rand())

###########

# Call main and exit success
if __name__ == "__main__":
    main()
    sys.exit()

and

#include <cstdlib>
#include <iostream>
#include <random>
#include <limits>

int main()
{
    unsigned seed = 1;

    std::mt19937 generator1(seed);  // mt19937 is a standard mersenne_twister_engine
    for (unsigned i = 0; i < 10; ++i) {
        unsigned val1 = generator1();
        std::cout << "Normalized, #" << i << ": " << (static_cast<double>(val1) / std::numeric_limits<std::uint32_t>::max()) << std::endl;
    }

    return EXIT_SUCCESS;
}

the Python program prints:

0.417022004702574
0.7203244934421581
0.00011437481734488664
0.30233257263183977
0.14675589081711304
0.0923385947687978
0.1862602113776709
0.34556072704304774
0.39676747423066994
0.538816734003357

whereas the C++ program prints:

Normalized, #0: 0.417022
Normalized, #1: 0.997185
Normalized, #2: 0.720324
Normalized, #3: 0.932557
Normalized, #4: 0.000114381
Normalized, #5: 0.128124
Normalized, #6: 0.302333
Normalized, #7: 0.999041
Normalized, #8: 0.146756
Normalized, #9: 0.236089

I could easily skip every other value in the C++ version, which should give me numbers that match the Python version (within a tolerance). But why would Python's implementation seem to skip every other value, or where do these extra values in the C++ version come from?

Upvotes: 1

fdermishin

Reputation: 3686

You can do it this way for integer values:

import numpy as np

np.random.seed(12345)
print(np.random.randint(256**4, dtype='<u4', size=1)[0])

#include <iostream>
#include <random>

int main()
{
    std::mt19937 e2(12345);
    std::cout << e2() << std::endl;
}

The result of both snippets is 3992670690

By looking at source code of rand you can implement it in your C++ code this way:

import numpy as np

np.random.seed(12345)
print(np.random.rand())

#include <iostream>
#include <iomanip>
#include <random>

int main()
{
    std::mt19937 e2(12345);
    int a = e2() >> 5;
    int b = e2() >> 6;
    double value = (a * 67108864.0 + b) / 9007199254740992.0;
    std::cout << std::fixed << std::setprecision(16) << value << std::endl;
}

Both random values are 0.9296160928171479

It would be convenient to use std::generate_canonical, but it uses another method to convert the output of Mersenne twister to double. The reason they differ is likely that generate_canonical is more optimized than the random generator used in NumPy, as it avoids costly floating point operations, especially multiplication and division, as seen in source code. However it seems to be implementation dependent, while NumPy produces the same result on all platforms.

double value = std::generate_canonical<double, std::numeric_limits<double>::digits>(e2);

This doesn't work and produces result 0.8901547132827379, which differs from the output of Python code.

Upvotes: 6

Same random numbers in C++ as computed by Python3 numpy.random.rand

Answers (3)

Related Questions