Reputation: 379
I am confused by the behaviour of the following code using random
in python:
SEED = ... # see below for some examples
for _ in range(100):
k = random.randint(1, 21)
print(k)
random.seed(SEED)
s = random.sample(population=range(100), k=k)
I would expect the first print(k)
to output a random number between 1 and 21, and the next 99 print(k)
statements to always output the same random number, because the random seed is set to the same value in each iteration of the for loop.
Instead, I see several random values of k
in the first few iterations, before the k
sequence seemingly enters a repeating sequence once one of the repeating values is hit. The length of the sequence varies depending on the value of SEED
.
A few examples to illustrate this (first occurrence of the repeating sequence in bold):
SEED = 4841
--> k = 14 1 21 1 21 1 21 ...SEED = 5733
--> k = 9 16 10 6 4 11 6 4 11 6 4 11 ...SEED = 22
--> k = 19 14 2 1 8 21 14 2 1 8 21 14 2 1 8 21 14 ...SEED = 31
--> k = 19 17 5 5 5 ...Some observations I've made trying to understand this:
s
shows the same behaviour, i.e. given the same SEED
, the same value k
always yields the same value of s
(as I would expect).random.seed(SEED)
to the top of the for loop, I always get the same value of k as expected.random.randint
and random.sample
and the fact that I pass k
as an argument. When I comment out the last line of the for loop or replace it with another call to random.randint
or pass some fixed value like k=5
, I get the expected behaviour of a constant sequence of k
.But I still don't understand why this leads to these repeating sequences, can anyone explain?
Upvotes: 1
Views: 68
Reputation: 16174
As Tom pointed out in the comments you're generating k
random numbers, which takes at least k
calls (Python uses rejection sampling to ensure draws aren't biased) to the underlying RNG.
You could fix your example by doing something like:
import random
SEED = ... # see below for some examples
MAX_K = 21
for _ in range(100):
k = random.randint(1, MAX_K)
print(k)
random.seed(SEED)
s = random.sample(population=range(100), k=MAX_K)[:k]
but this obviously comes at a cost of drawing more values than you're likely to use.
Some RNGs support "seeking" but seeks tend to be relatively expensive (compared to drawing a random number) so this might not help this example, as you're only expecting to draw a few extra values. Further, the difficultly in determining how many rejections you're about to get would mean you'd probably need at least one run with k = MAX_K
to discover how far ahead you need to seek.
Update: to explain why you get these repeating sequences lets rearrange your code:
SEED = ...
graph = {}
for k in range(1, 22):
random.seed(SEED)
random.sample(population=range(100), k=k)
graph[k] = random.randint(1, 21)
This uses the random number generator the same as your original code, but generates a graph showing that if the previous iteration evaluated to a given k
then what would the next round give.
If I run this for SEED = 22
I get the following graph:
showing the cycle that you saw in your question. The image was created by graphviz if you want to do something similar.
Upvotes: 1