Reputation: 2091
Languages like C++ require the programmer to set the seed of the random number generator, otherwise its output will always be the same. However, libraries like numpy do not require you to initialize the seed manually.
For example, code like:
from numpy.random import rand
rand()
gives a different result every time.
Does this mean that numpy.random.RandomState(seed=None)
is called every time you call rand
?
Upvotes: 3
Views: 2754
Reputation: 74182
The numpy.random
module is like the random
module from the Python standard library, in that the functions in numpy.random
are bound methods of a hidden generator object that is instantiated when you import the module. This hidden numpy.random.RandomState
instance currently lives in np.random.mtrand._rand
(although you shouldn't rely on it always being there in future versions of numpy):
print(np.random.rand)
# <built-in method rand of mtrand.RandomState object at 0x7f50ced03660>
# note the same memory address of the RandomState object:
print(np.random.mtrand._rand)
# <mtrand.RandomState object at 0x7f50ced03660>
The hidden RandomState
instance will be seeded only once when you import the module (unless you then set the seed explicitly using np.random.seed()
). If a new seed was chosen every time you called rand()
then there would be no way to create reproducible sequences of pseudorandom numbers.
The situation looks something like:
# implicit RandomState created and seeded
from numpy import random
# # we could subsequently re-seed the hidden RandomState, e.g.:
# random.seed(None)
# different random variates
r1 = random.rand(1)
r2 = random.rand(1)
r3 = random.rand(1)
# ...
The automatic seeding is equivalent to np.random.RandomState(None)
, which uses some platform-dependent source of randomness (usually /dev/urandom
on *nix) to set the seed.
Upvotes: 2
Reputation: 365807
Does that mean
numpy.random.RandomState(seed=None)
is called every time you call rand?
No, it means the RandomState
is seeded once at startup. If it were re-seeded every time you call rand
, then there would be no way to explicitly ask for a repeatable pattern.
The same is true for the Python stdlib's random
module.
And, despite what you say about C++, it's also true for the C++ stdlib's <random>
functions.
All of these document that the default seed, if you don't do anything, comes from something like the system time or a system entropy generator (like /dev/random
on most *nix systems).
This is not the case for C's rand
(which is still there in C++, although you should treat it as deprecated*), but only because C goes out of its way to require that startup must do the equivalent of calling srand(1)
.
If you're interested in exactly how the "once at startup" works in NumPy:
numpy.random
module (which gets run the first time you import numpy.random
or from numpy.random import something
in your code), it constructs a global RandomState
, with the default arguments (meaning seed=None
).RandomState
's initializer just passes the seed
argument on to the seed
method.RandomState.seed
, when called with None
, uses an appropriate source of system entropy for your platform (like /dev/urandom
).rand
, it uses that global RandomState
.* Not because of this problem; it's easy enough to remember to call srand
at the start of your program. But a PRNG that explicitly doesn't guarantee a cycle length longer than 32767, an unbiased distribution, etc. is just a bad idea for almost anything…
Upvotes: 6