Why does a combined call to random.randint and random.sample in a loop lead to a repeating output sequence?

Question

I am confused by the behaviour of the following code using random in python:

SEED = ... # see below for some examples

for _ in range(100):
    k = random.randint(1, 21)
    print(k)

    random.seed(SEED)
    s = random.sample(population=range(100), k=k)

I would expect the first print(k) to output a random number between 1 and 21, and the next 99 print(k) statements to always output the same random number, because the random seed is set to the same value in each iteration of the for loop.

Instead, I see several random values of k in the first few iterations, before the k sequence seemingly enters a repeating sequence once one of the repeating values is hit. The length of the sequence varies depending on the value of SEED.

A few examples to illustrate this (first occurrence of the repeating sequence in bold):

SEED = 4841 --> k = 14 1 21 1 21 1 21 ...
SEED = 5733 --> k = 9 16 10 6 4 11 6 4 11 6 4 11 ...
SEED = 22 --> k = 19 14 2 1 8 21 14 2 1 8 21 14 2 1 8 21 14 ...
SEED = 31 --> k = 19 17 5 5 5 ...

Some observations I've made trying to understand this:

Printing the sample s shows the same behaviour, i.e. given the same SEED, the same value k always yields the same value of s (as I would expect).
When I move the line random.seed(SEED) to the top of the for loop, I always get the same value of k as expected.
This really seems to be rooted in the interaction between random.randint and random.sample and the fact that I pass k as an argument. When I comment out the last line of the for loop or replace it with another call to random.randint or pass some fixed value like k=5, I get the expected behaviour of a constant sequence of k.

But I still don't understand why this leads to these repeating sequences, can anyone explain?

Sam Mason · Accepted Answer

As Tom pointed out in the comments you're generating k random numbers, which takes at least k calls (Python uses rejection sampling to ensure draws aren't biased) to the underlying RNG.

You could fix your example by doing something like:

import random

SEED = ... # see below for some examples
MAX_K = 21

for _ in range(100):
    k = random.randint(1, MAX_K)
    print(k)

    random.seed(SEED)
    s = random.sample(population=range(100), k=MAX_K)[:k]

but this obviously comes at a cost of drawing more values than you're likely to use.

Some RNGs support "seeking" but seeks tend to be relatively expensive (compared to drawing a random number) so this might not help this example, as you're only expecting to draw a few extra values. Further, the difficultly in determining how many rejections you're about to get would mean you'd probably need at least one run with k = MAX_K to discover how far ahead you need to seek.

Update: to explain why you get these repeating sequences lets rearrange your code:

SEED = ...

graph = {}
for k in range(1, 22):
    random.seed(SEED)
    random.sample(population=range(100), k=k)
    graph[k] = random.randint(1, 21)

This uses the random number generator the same as your original code, but generates a graph showing that if the previous iteration evaluated to a given k then what would the next round give.

If I run this for SEED = 22 I get the following graph:

showing the cycle that you saw in your question. The image was created by graphviz if you want to do something similar.

Why does a combined call to random.randint and random.sample in a loop lead to a repeating output sequence?

Answers (1)

Related Questions