Reputation: 16403
I just observed that when using Python3 the shuffling of a list with random.shuffle
needs approximately half the runtime when explicitly submitting the function random.random
for the random
keyword argument. I checked whether Python2 has the same problem but found that it only occurs with Python3.
I use the following code to measure the runtime of the two versions:
from timeit import Timer
t1 = Timer("random.shuffle(l)", "import random; l = list(range(100000))")
t2 = Timer("random.shuffle(l, random = random.random)", "import random; l = list(range(100000))")
print("With default rand: %s" % t1.repeat(10,1))
print("With custom rand: %s" % t2.repeat(10,1))
I made a testcase at ideone for you to see with Python3 and the same code with Python2.
According to the documentation for shuffle the same function random.random
is used in the default case when I omit the optional keyword argument random
, so there should be no difference when I give it the same function to generate the random number as in the default case.
I checked the respective sources (Python2 vs. Python3) for the shuffle
function in the Lib/random.py
folders and found that they behave the same way if I explicitly call the Python3 version with a function for the random
keyword. If I omit this argument, Python3 uses the helper function _randbelow
so there should be the root for my problem. I can't see why Python3 uses _randbelow
because it slows shuffle
down. As far as I understand it, its benefit lies in generating arbitrary large random numbers, but it should not slow down my shuffling of a list that has way fewer than 2^32 elements (100000 in my case).
Can anyone explain to me why I'm seeing such a difference in the runtimes although they should be closer together when I use Python3?
P.S.: Please note that I'm not interested why runtime with Python2 is better than with Python3, but the difference in runtime when using the argument rand=rand.rand
argument in Python3 versus not using it in Python3 only.
Upvotes: 8
Views: 653
Reputation: 1095
The docstring in the function random.shuffle
contradicts the code.
In python 2.7.2+ the docstring is correct:
def shuffle(self, x, random=None, int=int):
"""x, random=random.random -> shuffle list x in place; return None.
Optional arg random is a 0-argument function returning a random
float in [0.0, 1.0); by default, the standard random.random.
"""
if random is None:
random = self.random
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random() * (i+1))
x[i], x[j] = x[j], x[i]
But in Python 3.2 we find:
def shuffle(self, x, random=None, int=int):
"""x, random=random.random -> shuffle list x in place; return None.
Optional arg random is a 0-argument function returning a random
float in [0.0, 1.0); by default, the standard random.random.
"""
randbelow = self._randbelow
for i in reversed(range(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = randbelow(i+1) if random is None else int(random() * (i+1))
x[i], x[j] = x[j], x[i]
So the docstring still tells the old story, but now the default function used is random.randbelow
Upvotes: 4