Why is shuffling list(range(n)) slower than shuffling [0]*n?

Question

Using random.shuffle, I noticed that shuffling list(range(n)) takes about 25% more time than shuffling [0] * n. Here are times for sizes n from 1 million to 2 million:

Why is shuffling list(range(n)) slower? Unlike for sorting a list (which needs to look at the objects) or copying a list (which increases reference counters inside the objects), the objects shouldn't matter here. This should just rearrange pointers inside the list.

I also tried numpy.random.shuffle, where shuffling list(range(n)) is three times (!) slower than shuffling [0] * n:

I also tried a third way to rearrange the elements in the list, namely list.reverse. Which, as expected, took equally long for both lists:

Just in case the shuffled order matters, I also tried list.reverse after shuffling the lists. Again, as expected, it took equally long for both lists, and also equally long as without that prior shuffling:

So what's the difference? Both shuffling and reversing only need to rearrange pointers inside the list, why do the objects matter for shuffling but not for reversing?

My benchmark code producing the times:

import random
import numpy
from timeit import repeat, timeit
from collections import defaultdict

shufflers = {
    'random.shuffle(mylist)': random.shuffle,
    'numpy.random.shuffle(mylist)': numpy.random.shuffle,
    'list.reverse(mylist)': list.reverse,
    }

creators = {
    'list(range(n))': lambda n: list(range(n)),
    '[0] * n': lambda n: [0] * n,
    }

for shuffler in shufflers:
    print(shuffler)
    for creator in creators:
        print(creator)
        times = defaultdict(list)
        for _ in range(10):
            for i in range(10, 21):
                n = i * 100_000
                mylist = creators[creator](n)
                # Uncomment next line for pre-shuffling
                # numpy.random.shuffle(mylist)
                time = timeit(lambda: shufflers[shuffler](mylist), number=1)
                times[n].append(time)
                s = '%.6f ' * len(times[n])
        # Indent next line further to see intermediate results
        print([round(min(times[n]), 9) for n in sorted(times)])

Kelly Bundy · Accepted Answer

The difference is that list.reverse, as a list function, has access to the underlying pointers array. So it can indeed rearrange the pointers without looking at the objects in any way (source):

reverse_slice(PyObject **lo, PyObject **hi)
{
    assert(lo && hi);

    --hi;
    while (lo < hi) {
        PyObject *t = *lo;
        *lo = *hi;
        *hi = t;
        ++lo;
        --hi;
    }
}

The random.shuffle and numpy.random.shuffle functions on the other hand only have an outsider view and go through the list's interface, which involves briefly loading the objects to swap them:

random.shuffle:

    def shuffle(self, x, random=None):
        ...
            for i in reversed(range(1, len(x))):
                # pick an element in x[:i+1] with which to exchange x[i]
                j = randbelow(i+1)
                x[i], x[j] = x[j], x[i]

numpy.random.shuffle:

    def shuffle(self, object x, axis=0):
          ...
                for i in reversed(range(1, n)):
                    j = random_interval(&self._bitgen, i)
                    x[i], x[j] = x[j], x[i]

So there's at least potential for a lot of cache misses. But let's as a test try reverse in Python:

    def my_reverse(x):
        lo = 0
        hi = len(x) - 1
        while lo < hi:
            x[lo], x[hi] = x[hi], x[lo]
            lo += 1
            hi -= 1

Benchmarking that:

Reversing list(range(n)) was just as fast as reversing [0] * n, despite loading the objects. The reason is that Python creates the objects pretty much sequentially in memory. Here's a test with a million objects. Almost all were located 16 bytes after the previous one:

>>> mylist = list(range(10**6))
>>> from collections import Counter
>>> ctr = Counter(id(b) - id(a) for a, b in zip(mylist, mylist[1:]))
>>> for distance, how_often in ctr.most_common():
        print(distance, how_often)

16 996056
48 3933
-1584548240 1
-3024 1
2416 1
-2240 1
2832 1
-304 1
-96 1
-45005904 1
6160432 1
38862896 1

So no wonder it's fast, as that's very cache-friendly.

But now let's use our Python reversal on a shuffled list (like in the question with list.reverse, where it didn't make a difference):

Big difference, now that my_reverse loads objects from randomly all over the place, which is the opposite of cache-friendly.

And of course that's what happens with the shuffle functions as well. While list(range(n)) initially is cache-friendly, the shuffling picks random indices j to swap with, which is very cache-unfriendly. And while i just moves sequentially, it's going to encounter a lot of already randomly swapped objects, so that's cache-unfriendly as well.

Why is shuffling list(range(n)) slower than shuffling [0]*n?

Answers (2)

Related Questions