Clang
Clang

Reputation: 379

Why does a combined call to random.randint and random.sample in a loop lead to a repeating output sequence?

I am confused by the behaviour of the following code using random in python:

SEED = ... # see below for some examples

for _ in range(100):
    k = random.randint(1, 21)
    print(k)

    random.seed(SEED)
    s = random.sample(population=range(100), k=k)

I would expect the first print(k) to output a random number between 1 and 21, and the next 99 print(k) statements to always output the same random number, because the random seed is set to the same value in each iteration of the for loop.

Instead, I see several random values of k in the first few iterations, before the k sequence seemingly enters a repeating sequence once one of the repeating values is hit. The length of the sequence varies depending on the value of SEED.

A few examples to illustrate this (first occurrence of the repeating sequence in bold):

Some observations I've made trying to understand this:

But I still don't understand why this leads to these repeating sequences, can anyone explain?

Upvotes: 1

Views: 68

Answers (1)

Sam Mason
Sam Mason

Reputation: 16174

As Tom pointed out in the comments you're generating k random numbers, which takes at least k calls (Python uses rejection sampling to ensure draws aren't biased) to the underlying RNG.

You could fix your example by doing something like:

import random

SEED = ... # see below for some examples
MAX_K = 21

for _ in range(100):
    k = random.randint(1, MAX_K)
    print(k)

    random.seed(SEED)
    s = random.sample(population=range(100), k=MAX_K)[:k]

but this obviously comes at a cost of drawing more values than you're likely to use.

Some RNGs support "seeking" but seeks tend to be relatively expensive (compared to drawing a random number) so this might not help this example, as you're only expecting to draw a few extra values. Further, the difficultly in determining how many rejections you're about to get would mean you'd probably need at least one run with k = MAX_K to discover how far ahead you need to seek.

Update: to explain why you get these repeating sequences lets rearrange your code:

SEED = ...

graph = {}
for k in range(1, 22):
    random.seed(SEED)
    random.sample(population=range(100), k=k)
    graph[k] = random.randint(1, 21)

This uses the random number generator the same as your original code, but generates a graph showing that if the previous iteration evaluated to a given k then what would the next round give.

If I run this for SEED = 22 I get the following graph:

graphviz output

showing the cycle that you saw in your question. The image was created by graphviz if you want to do something similar.

Upvotes: 1

Related Questions