Reputation: 1698
For the purpose of reproducibility, one has to choose a seed. In R, we can use set.seed()
.
My question is, when the seed is not set explicitly, how does the computer choose the seed?
Why is there no default seed?
Upvotes: 9
Views: 4018
Reputation: 2956
A pseudo random number generator (PRNG) needs a default start value, which you can set with set.seed()
. If there is no given it generally takes computer based information. This could be time, cpu temperatur or something similar. If you want a more random start value it is possible to use physical values, like white noise or nuclear decay, but you generally need an extern information source for this kind of random information.
The documentation mentions R uses current time and the process ID:
Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.
A default seed is a bad idea, since a random generators would always produce the same samples of numbers by default. If you always take the same seed it's not anymore randomized, since there will be always the same numbers. So you just provide a fixed data sample, which is not the intended output of a PRNG. You could of course turn the default seed off (if there would be one), but the intended function is primary to generate a completely random set of data and not a fixed one.
For statistical approaches it matters for validation and verification reasons, but it's getting more important when you get to cryptography. In this field a good PRNG is mandatory.
Upvotes: 13