Richard
Richard

Reputation: 61269

Method for generating a random bitset of uniform distribution

How can I generate a bitset whose length is a multiple of 8 (corresponding to a standard data type) wherein each bit is 0 or 1 with equal probability?

Upvotes: 2

Views: 448

Answers (2)

Rakete1111
Rakete1111

Reputation: 48938

Here is a nice function to achieve this:

template<typename T, std::size_t N = sizeof(T) * CHAR_BIT> //CHAR_BIT is 8 on most
                                                           //architectures
auto randomBitset() {
    std::uniform_int_distribution<int> dis(0, 1);
    std::mt19937 mt{ std::random_device{}() };

    std::string values;
    for (std::size_t i = 0; i < N; ++i)
        values += dis(mt) + '0';

    return std::bitset<N>{ values };
}

Upvotes: 0

Richard
Richard

Reputation: 61269

The following works.

  • Choose a PRNG with good statistical properties
  • Seed it well
  • Generate integers over an inclusive range including the minimum and maximum of the integer type.
  • Since the integers are uniformly distributed across their entire range, each bit representation must be equally probable. Since all bit representations are present, each bit is equally like to be on or off.

The following code accomplishes this:

#include <cstdint>
#include <iostream>
#include <random>
#include <algorithm>
#include <functional>
#include <bitset>

//Generate the goodness
template<class T>
T uniform_bits(std::mt19937& g){
  std::uniform_int_distribution<T> dist(std::numeric_limits<T>::lowest(),std::numeric_limits<T>::max());
  return dist( g );
}

int main(){
  //std::default_random_engine can be anything, including an engine with short
  //periods and bad statistical properties. Rather than cross my finers and pray
  //that it'll somehow be okay, I'm going to rely on an engine whose strengths
  //and weaknesses I know.
  std::mt19937 engine;

  //You'll see a lot of people write `engine.seed(std::random_device{}())`. This
  //is bad. The Mersenne Twister has an internal state of 624 bytes. A single
  //call to std::random_device() will give us 4 bytes: woefully inadequate. The
  //following method should be slightly better, though, sadly,
  //std::random_device may still return deterministic, poorly distributed
  //numbers.
  std::uint_fast32_t seed_data[std::mt19937::state_size];
  std::random_device r;
  std::generate_n(seed_data, std::mt19937::state_size, std::ref(r));
  std::seed_seq q(std::begin(seed_data), std::end(seed_data));
  engine.seed(q);

  //Use bitset to print the numbers for analysis
  for(int i=0;i<50000;i++)
    std::cout<<std::bitset<64>(uniform_bits<uint64_t>(engine))<<std::endl;

  return 0;
}

We can test the output by compiling (g++ -O3 test.cpp) and doing some stats with:

./a.out | sed -E 's/(.)/ \1/g' | sed 's/^ //' | numsum -c | tr " " "\n" | awk '{print $1/25000}' | tr "\n" " "

The result is:

1.00368 1.00788 1.00416 1.0036 0.99224 1.00632 1.00532 0.99336 0.99768 0.99952 0.99424 1.00276 1.00272 0.99636 0.99728 0.99524 0.99464 0.99424 0.99644 1.0076 0.99548 0.99732 1.00348 1.00268 1.00656 0.99748 0.99404 0.99888 0.99832 0.99204 0.99832 1.00196 1.005 0.99796 1.00612 1.00112 0.997 0.99988 0.99396 0.9946 1.00032 0.99824 1.00196 1.00612 0.99372 1.00064 0.99848 1.00008 0.99848 0.9914 1.00008 1.00416 0.99716 1.00868 0.993 1.00468 0.99908 1.003 1.00384 1.00296 1.0034 0.99264 1 1.00036

Since all of the values are "close" to one, we conclude that our mission is accomplished.

Upvotes: 2

Related Questions