halex
halex

Reputation: 16403

Shorter runtime of `random.shuffle` when using `random.random` as keyword argument in Python3

I just observed that when using Python3 the shuffling of a list with random.shuffle needs approximately half the runtime when explicitly submitting the function random.random for the random keyword argument. I checked whether Python2 has the same problem but found that it only occurs with Python3.

I use the following code to measure the runtime of the two versions:

from timeit import Timer
t1 = Timer("random.shuffle(l)", "import random; l = list(range(100000))")
t2 = Timer("random.shuffle(l, random = random.random)", "import random; l = list(range(100000))")
print("With default rand: %s" % t1.repeat(10,1))
print("With custom rand: %s" % t2.repeat(10,1))

I made a testcase at ideone for you to see with Python3 and the same code with Python2.

According to the documentation for shuffle the same function random.random is used in the default case when I omit the optional keyword argument random, so there should be no difference when I give it the same function to generate the random number as in the default case.

I checked the respective sources (Python2 vs. Python3) for the shuffle function in the Lib/random.py folders and found that they behave the same way if I explicitly call the Python3 version with a function for the random keyword. If I omit this argument, Python3 uses the helper function _randbelow so there should be the root for my problem. I can't see why Python3 uses _randbelow because it slows shuffle down. As far as I understand it, its benefit lies in generating arbitrary large random numbers, but it should not slow down my shuffling of a list that has way fewer than 2^32 elements (100000 in my case).

Can anyone explain to me why I'm seeing such a difference in the runtimes although they should be closer together when I use Python3?

P.S.: Please note that I'm not interested why runtime with Python2 is better than with Python3, but the difference in runtime when using the argument rand=rand.rand argument in Python3 versus not using it in Python3 only.

Upvotes: 8

Views: 653

Answers (1)

Schuh
Schuh

Reputation: 1095

The docstring in the function random.shuffle contradicts the code. In python 2.7.2+ the docstring is correct:

    def shuffle(self, x, random=None, int=int):
    """x, random=random.random -> shuffle list x in place; return None.

    Optional arg random is a 0-argument function returning a random
    float in [0.0, 1.0); by default, the standard random.random.
    """

    if random is None:
        random = self.random
    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

But in Python 3.2 we find:

def shuffle(self, x, random=None, int=int):
    """x, random=random.random -> shuffle list x in place; return None.

    Optional arg random is a 0-argument function returning a random
    float in [0.0, 1.0); by default, the standard random.random.
    """

    randbelow = self._randbelow
    for i in reversed(range(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = randbelow(i+1) if random is None else int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

So the docstring still tells the old story, but now the default function used is random.randbelow

Upvotes: 4

Related Questions