Rich95
Rich95

Reputation: 351

How to optimize many rand() calls?

I have a particle simulator that requires many calls for random numbers, mainly between 0.0-1.0. It does this for every particle generated so you can see it adds up.

I counted no fewer than 60 of them per iteration: (double)(rand() % RAND_MAX) / (RAND_MAX)

They are used for many features ranging from color randomness to size to velocity to forces, etc.

My question is two-fold:

  1. Does the rand() function take a big toll on performance or is 60 of them, per particle, not worth worrying about? I could easily have 1000 particles so that would equate to 60,000 random calls!

  2. I know I could compute say 5 random floats and re-use those throughout the 60 calls but I worry that's a bad idea and I'll start seeing inconsistencies due to the re-use. Computing just one and reusing that would probably be a terrible idea, right?

Is there any other better way to optimize this?

Upvotes: -1

Views: 249

Answers (3)

Jerry Coffin
Jerry Coffin

Reputation: 490573

There's a major consideration here that nobody seems to have mentioned (and the question itself doesn't say enough to be sure whether it's an issue or not).

The potential issue arises if you call rand() from multiple threads of execution. rand() uses a (one, singular) seed value. Each time you call it, the seed value is updated so the next call can/will produce a new/different value.

To make that work from multiple threads of execution, rand() will typically need to protect that seed with a mutex (or something similar). If you call rand() from multiple threads, the code ends up being serialized, so only one call to rand() runs at a time.

So, even with many threads, and enough cores to run the concurrently,heavy use of rand() will frequently result in making effective use of only one or two cores, rather than all that are available (though the exact core usage will vary depending on how often you're using rand()).

The random number generators from the C++ <random> header don't suffer from this shortcoming. Each random number generator (that uses a seed) stores the seed in the random number generator object itself. So you can have 16 threads using 16 separate random number generator objects, with no synchronization between them at all. Note, however, that std::random_device will normally retrieve numbers (at least indirectly) from some random number generation hardware, so use of it frequently is more or less serialized, so you typically want to use it only to retrieve an initial seed for some pseud-random generator. The library also directly supports generating numbers in a range, so (unlike your current code) the numbers you generate in the 0.0 ... 1.0 range won't be biased.

Upvotes: 2

jpmarinier
jpmarinier

Reputation: 4733

As mentioned in some excellent comments and answers, in efficiency matters intuition and (even educated) guesses tend to be poor guides.

But if you happen to know the total number of random numbers generated, you can write a small test program that just generates that same count of random numbers, plus perhaps some cheap statistical moments for sanity checks purposes.

Next, you can compare the time taken by this small program with the total time taken by your actual physics application.

In order to have good and portable statistical guarantees, you can do that using the modern facilities as provided by the standard C++ <random> header file, rather than using the legacy rand()/srand() functions.

The C++ code of your small test program could be like this:

//-- https://isocpp.org/files/papers/n3551.pdf
//-- https://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution

#include  <iostream>
#include  <random>

int main()
{
    std::random_device  rd;  // Will be used to obtain a seed
                             // on Linux, could be:  rd{"/dev/urandom"};
    std::mt19937        gen(rd()); // Standard mersenne_twister_engine seeded with rd()
    std::uniform_real_distribution<double>  dis(0.0, 1.0);

    int     count = 60*1000*1000;
    double  sum1  = 0.0;
    double  sum2  = 0.0;
    double  r     = 0.0;

    for (int n = 0; n < count; ++n) {
        // Use dis to transform the random unsigned int generated by gen into a 
        // double in [0, 1). Each call to dis(gen) generates a new random double.
        r = dis(gen);
        sum1 += r;
        sum2 += r*r;
    }

    double avg   = sum1 / (double)count;
    double stdev = std::sqrt ((sum2 / (double)count) - avg*avg);

    std::cout << "count = " << count << '\n';
    std::cout << "average = " << avg << '\n';
    std::cout << "standard deviation = " << stdev << '\n';

    return EXIT_SUCCESS;
}

It turns out that on a vintage Intel x86-64 PC, the time to generate 60 millions random numbers between 0.0 and 1.0 with this code is slightly below one second, using option -O2 with compiler:

g++ (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1)

So we can expect that the time taken to generate 60,000 random numbers on a modern CPU is below one millisecond.

Upvotes: 2

Wolfgang Bangerth
Wolfgang Bangerth

Reputation: 245

This is not an answer to your question as much as a comment: Even experienced programmers cannot tell where the bottlenecks of their code are unless they actually profile the code. The same applies to your question: You are wondering whether a part of the code is a performance problem, but you present no evidence that you have measured this part, and consequently you may very well be wasting your time wondering about this issue.

So the recommendation would be: If your code is too slow, run it with a profiler and see whether the calls to generate random numbers show up anywhere close to the top. If they do, then you can start wondering about solutions. If they don't, well, then you just gained a couple of days to do something else!

Upvotes: 3

Related Questions