Why is my code slower in pypy than the default python interpreter?

Question

Given a number of players n, I need to find H, the list of all tuples where each tuple is a combination of coalitions (of the players, e.g. (1,2,3) is the coalition of players 1, 2 and 3. ((1,2,3),(4,5),(6,)) is a combination of coalitions - which are also tuples) that respects this rule: each player appears only and exactly once (i.e. in only one coalition). P.S. Each combination of coalitions is called layout in the code.

At the beginning I wrote a snippet in which I computed all combinations of all coalitions and for each combination I checked the rule. Problem is that for 5-6 players the number of combinations of coalitions was already so big that my computer went phut. In order to avoid a a big part of the computation (all possible combinations, the loop and the ifs) I wrote the following (which I tested and it's equivalent to the previous snippet):

from itertools  import combinations, combinations_with_replacement, product, permutations

players = range(1,n+1)
coalitions = [[coal for coal in list(combinations(players,length))] for length in players]

H = [tuple(coalitions[0]),(coalitions[-1][0],)]
combs = [comb for length in xrange(2,n) for comb in combinations_with_replacement(players,length) if sum(comb) == n]
perms = list(permutations(players))
layouts = set(frozenset(frozenset(perm[i:i+x]) for (i,x) in zip([0]+[sum(comb[:y]) for y in xrange(1,len(comb))],comb)) for comb in combs for perm in perms)
H.extend(tuple(tuple(tuple(coal) for coal in layout) for layout in layouts))
print H

EXPLANATION: say n = 3

First I create all possible coalitions:

coalitions = [[(1,),(2,),(3,)],[(1,2),(1,3),(2,3)],[(1,2,3)]]

Then I initialize H with the obvious combinations: each player in his own coalition and every player in the biggest coalition.

H = [((1,),(2,),(3,)),((1,2,3),)]

Then I compute all the possible forms of the layouts:

combs = [(1,2)]   #(1,2) represents a layout in which there is 
                  #one 1-player coalition and one 2-player coalition.

I compute the permutations (perms). Finally for each perm and for each comb I calculate the different possible layouts. I set the result (layouts) in order to delete duplicates and add to H.

H = [((1,),(2,),(3,)),((1,2,3),),((1,2),(3,)),((1,3),(2,)),((2,3),(1,))]

Here's the comparison:

python script.py

4: 0.000520944595337 seconds
5: 0.0038321018219 seconds
6: 0.0408189296722 seconds
7: 0.431486845016 seconds
8: 6.05224680901 seconds
9: 76.4520540237 seconds

pypy script.py

4: 0.00342392921448 seconds
5: 0.0668039321899 seconds
6: 0.311077833176 seconds
7: 1.13124799728 seconds
8: 11.5973010063 seconds
9: went phut

Why is pypy that slower? What should I change?

Corbin · Accepted Answer

First, I want to point out that you are studying the Bell numbers, which might ease the next part of your work, after you're done generating all the subsets. For example, it's easy to know how large each Bell set will be; OEIS has the sequence of Bell numbers already.

I hand-wrote the loops to generate the Bell sets; here is my code:

cache = {0: (), 1: ((set([1]),),)}

def bell(x):
    # Change these lines to alter memoization.
    if x in cache:
        return cache[x]
    previous = bell(x - 1)
    new = []
    for sets in previous:
        r = []
        for mark in range(len(sets)):
            l = [s | set([x]) if i == mark else s for i, s in enumerate(sets)]
            r.append(tuple(l))
        new.extend(r)
        new.append(sets + (set([x]),))
    cache[x] = tuple(new)
    return new

I included some memoization here for practical purposes. However, by commenting out some code, and writing some other code, you can obtain the following un-memoized version, which I used for benchmarks:

def bell(x):
    if x == 0:
        return ()
    if x == 1:
        return ((set([1]),),)
    previous = bell(x - 1)
    new = []
    for sets in previous:
        r = []
        for mark in range(len(sets)):
            l = [s | set([x]) if i == mark else s for i, s in enumerate(sets)]
            r.append(tuple(l))
        new.extend(r)
        new.append(sets + (set([x]),))
    cache[x] = tuple(new)
    return new

My numbers are based on a several-year-old Thinkpad that I do most of my work on. Most of the smaller cases are way too fast to measure reliably (not even a single millisecond per trial for the first few), so my benchmarks are testing bell(9) through bell(11).

Benchmarks for CPython 2.7.11, using the standard timeit module:

$ python -mtimeit -s 'from derp import bell' 'bell(9)'
10 loops, best of 3: 31.5 msec per loop
$ python -mtimeit -s 'from derp import bell' 'bell(10)'
10 loops, best of 3: 176 msec per loop
$ python -mtimeit -s 'from derp import bell' 'bell(11)'
10 loops, best of 3: 1.07 sec per loop

And on PyPy 4.0.1, also using timeit:

$ pypy -mtimeit -s 'from derp import bell' 'bell(9)'
100 loops, best of 3: 14.3 msec per loop
$ pypy -mtimeit -s 'from derp import bell' 'bell(10)'
10 loops, best of 3: 90.8 msec per loop
$ pypy -mtimeit -s 'from derp import bell' 'bell(11)'
10 loops, best of 3: 675 msec per loop

So, the conclusion that I've come to is that itertools is not very fast when you try to use it outside of its intended idioms. Bell numbers are interesting combinatorically but they do not naturally arise from any simple composition of itertools widgets that I can find.

In response to your original query of what to do to make it faster: Just open-code it. Hope this helps!

~ C.

Why is my code slower in pypy than the default python interpreter?

Answers (2)

Related Questions