David542
David542

Reputation: 110203

Improved way to get subsets of a list

The following is how I'm getting subsets of a list:

def subsets(s):
    if not s: return [[]]
    rest = subsets(s[1:])
    return rest + map(lambda x: [s[0]] + x, rest)

x = sorted(subsets([1,2,3]), key=lambda x: (len(x), x))
# [[], [1], [2], [3], [1, 2], [1, 3], [2, 3], [1, 2, 3]]

It works, but is a bit slow. I'm wondering what might be a better less-naive algorithm to get all subsets in python (without just using something like itertools.combinations).

Upvotes: 1

Views: 148

Answers (1)

joanis
joanis

Reputation: 12229

It would be much more efficient to avoid storing the list of subsets in memory unless you actually need it.

Solution 1: return a generator as output

This code accomplishes that with a recursive generator function:

def subsets(s):
    if not s:
        yield []
    else:
        for x in subsets(s[1:]):
            yield x
            yield [s[0]] + x

This function will lazily do its work only as required by the caller and never stores more than just the call stack in memory.

The first yield in the else clause serves to yield all elements of subsets(s[1:]) unchanged, i.e., those subsets of s that don't include s[0], while the second one yields all the subsets of s that do include s[0].

Of course, if you want this list sorted, the efficiency is lost when you call sorted() on the results, which builds the list in memory.

Solution 2: also accept a generator or an iterator as input

Solution 1 has the weakness that it only works on indexable s types, like lists and tuples, where s[0] and s[1:] are well defined.

If we want to make subsets more generic and accept any iterable (including iterators), we have to avoid that syntax and use next() instead:

def subsets(s):
    iterator = iter(s)
    try:
        item = next(iterator)
    except StopIteration:
        # s in empty, bottom out of the recursion
        yield []
    else:
        # s is not empty, recurse and expand
        for x in subsets(iterator):
            yield x
            yield [item] + x

Measuring performance

With all this, I wanted to know how much difference these solutions make, so I ran some benchmarks.

I compared your initial code, which I'm calling subsets_lists, my solution1, subsets_generator here, my solution2, subsets_gen_iterable. For completeness, I added the powerset function suggested in itertools.

Results:

All subsets of a list of size 10, each call repeated 10 times:

Code block 'l = subsets_lists(range(10)) repeats=10' took: 5.58828 ms
Code block 'g = subsets_generator(range(10)) repeats=10' took: 0.05693 ms
Code block 'g = subsets_gen_iterable(range(10)) repeats=10' took: 0.01316 ms
Code block 'l = list(subsets_generator(range(10))) repeats=10' took: 4.86464 ms
Code block 'l = list(subsets_gen_iterable(range(10))) repeats=10' took: 3.76597 ms
Code block 'l = list(powerset(range(10))) size=10 repeats=10' took: 1.11228 ms

All subsets of a list of size 20, each call repeated 10 times:

Code block 'l = subsets_lists(range(20)) repeats=10' took: 12144.99487 ms
Code block 'g = subsets_generator(range(20)) repeats=10' took: 65.18992 ms
Code block 'g = subsets_gen_iterable(range(20)) repeats=10' took: 0.01784 ms
Code block 'l = list(subsets_generator(range(20))) repeats=10' took: 10859.75128 ms
Code block 'l = list(subsets_gen_iterable(range(20))) repeats=10' took: 10074.26618 ms
Code block 'l = list(powerset(range(20))) size=20 repeats=10' took: 2336.81373 ms

Observations:
  • You can see that calling the two generator functions does not do much work unless the generators are fed into something that consumes it, like the list() constructor. Although subsets_generator does seem to do a lot more work up front than subsets_gen_iterable.
  • My two solutions yield modest speed improvements over your code
  • The itertools-based solution is still much faster.
Actual benchmarking code:
import itertools
import linetimer

def subsets_lists(s):
    if not s: return [[]]
    rest = subsets_lists(s[1:])
    return rest + list(map(lambda x: [s[0]] + x, rest))

def subsets_generator(s):
    if not s:
        yield []
    else:
        for x in subsets_generator(s[1:]):
            yield x
            yield [s[0]] + x

def subsets_iterable(s):
    iterator = iter(s)
    try:
        item = next(iterator)
    except StopIteration:
        yield []
    else:
        for x in subsets_iterable(iterator):
            yield x
            yield [item] + x

def powerset(iterable):
    "Source: https://docs.python.org/3/library/itertools.html#itertools-recipes"
    s = list(iterable)
    return itertools.chain.from_iterable(
        itertools.combinations(s, r)
        for r in range(len(s)+1)
    )

for repeats in 1, 10:
    for size in 4, 10, 14, 20:
        with linetimer.CodeTimer(f"l = subsets_lists(range({size})) repeats={repeats}"):
            for _ in range(repeats):
                l = subsets_lists(range(size))
        with linetimer.CodeTimer(f"g = subsets_generator(range({size})) repeats={repeats}"):
            for _ in range(repeats):
                l = subsets_generator(range(size))
        with linetimer.CodeTimer(f"g = subsets_gen_iterable(range({size})) repeats={repeats}"):
            for _ in range(repeats):
                l = subsets_iterable(range(size))
        with linetimer.CodeTimer(f"l = list(subsets_generator(range({size}))) repeats={repeats}"):
            for _ in range(repeats):
                l = list(subsets_generator(range(size)))
        with linetimer.CodeTimer(f"l = list(subsets_gen_iterable(range({size}))) repeats={repeats}"):
            for _ in range(repeats):
                l = list(subsets_iterable(range(size)))
        with linetimer.CodeTimer(f"l = list(powerset(range({size}))) size={size} repeats={repeats}"):
            for _ in range(repeats):
                l = list(powerset(range(size)))

Upvotes: 2

Related Questions