Reputation: 110203
The following is how I'm getting subsets of a list:
def subsets(s):
if not s: return [[]]
rest = subsets(s[1:])
return rest + map(lambda x: [s[0]] + x, rest)
x = sorted(subsets([1,2,3]), key=lambda x: (len(x), x))
# [[], [1], [2], [3], [1, 2], [1, 3], [2, 3], [1, 2, 3]]
It works, but is a bit slow. I'm wondering what might be a better less-naive algorithm to get all subsets in python (without just using something like itertools.combinations
).
Upvotes: 1
Views: 148
Reputation: 12229
It would be much more efficient to avoid storing the list of subsets in memory unless you actually need it.
This code accomplishes that with a recursive generator function:
def subsets(s):
if not s:
yield []
else:
for x in subsets(s[1:]):
yield x
yield [s[0]] + x
This function will lazily do its work only as required by the caller and never stores more than just the call stack in memory.
The first yield
in the else
clause serves to yield all elements of subsets(s[1:])
unchanged, i.e., those subsets of s
that don't include s[0]
, while the second one yields all the subsets of s
that do include s[0]
.
Of course, if you want this list sorted, the efficiency is lost when you call sorted()
on the results, which builds the list in memory.
Solution 1 has the weakness that it only works on indexable s
types, like lists and tuples, where s[0]
and s[1:]
are well defined.
If we want to make subsets
more generic and accept any iterable (including iterators), we have to avoid that syntax and use next()
instead:
def subsets(s):
iterator = iter(s)
try:
item = next(iterator)
except StopIteration:
# s in empty, bottom out of the recursion
yield []
else:
# s is not empty, recurse and expand
for x in subsets(iterator):
yield x
yield [item] + x
With all this, I wanted to know how much difference these solutions make, so I ran some benchmarks.
I compared your initial code, which I'm calling subsets_lists
, my solution1, subsets_generator
here, my solution2, subsets_gen_iterable
. For completeness, I added the powerset
function suggested in itertools.
All subsets of a list of size 10, each call repeated 10 times:
Code block 'l = subsets_lists(range(10)) repeats=10' took: 5.58828 ms
Code block 'g = subsets_generator(range(10)) repeats=10' took: 0.05693 ms
Code block 'g = subsets_gen_iterable(range(10)) repeats=10' took: 0.01316 ms
Code block 'l = list(subsets_generator(range(10))) repeats=10' took: 4.86464 ms
Code block 'l = list(subsets_gen_iterable(range(10))) repeats=10' took: 3.76597 ms
Code block 'l = list(powerset(range(10))) size=10 repeats=10' took: 1.11228 ms
All subsets of a list of size 20, each call repeated 10 times:
Code block 'l = subsets_lists(range(20)) repeats=10' took: 12144.99487 ms
Code block 'g = subsets_generator(range(20)) repeats=10' took: 65.18992 ms
Code block 'g = subsets_gen_iterable(range(20)) repeats=10' took: 0.01784 ms
Code block 'l = list(subsets_generator(range(20))) repeats=10' took: 10859.75128 ms
Code block 'l = list(subsets_gen_iterable(range(20))) repeats=10' took: 10074.26618 ms
Code block 'l = list(powerset(range(20))) size=20 repeats=10' took: 2336.81373 ms
list()
constructor. Although subsets_generator
does seem to do a lot more work up front than subsets_gen_iterable
.itertools
-based solution is still much faster.import itertools
import linetimer
def subsets_lists(s):
if not s: return [[]]
rest = subsets_lists(s[1:])
return rest + list(map(lambda x: [s[0]] + x, rest))
def subsets_generator(s):
if not s:
yield []
else:
for x in subsets_generator(s[1:]):
yield x
yield [s[0]] + x
def subsets_iterable(s):
iterator = iter(s)
try:
item = next(iterator)
except StopIteration:
yield []
else:
for x in subsets_iterable(iterator):
yield x
yield [item] + x
def powerset(iterable):
"Source: https://docs.python.org/3/library/itertools.html#itertools-recipes"
s = list(iterable)
return itertools.chain.from_iterable(
itertools.combinations(s, r)
for r in range(len(s)+1)
)
for repeats in 1, 10:
for size in 4, 10, 14, 20:
with linetimer.CodeTimer(f"l = subsets_lists(range({size})) repeats={repeats}"):
for _ in range(repeats):
l = subsets_lists(range(size))
with linetimer.CodeTimer(f"g = subsets_generator(range({size})) repeats={repeats}"):
for _ in range(repeats):
l = subsets_generator(range(size))
with linetimer.CodeTimer(f"g = subsets_gen_iterable(range({size})) repeats={repeats}"):
for _ in range(repeats):
l = subsets_iterable(range(size))
with linetimer.CodeTimer(f"l = list(subsets_generator(range({size}))) repeats={repeats}"):
for _ in range(repeats):
l = list(subsets_generator(range(size)))
with linetimer.CodeTimer(f"l = list(subsets_gen_iterable(range({size}))) repeats={repeats}"):
for _ in range(repeats):
l = list(subsets_iterable(range(size)))
with linetimer.CodeTimer(f"l = list(powerset(range({size}))) size={size} repeats={repeats}"):
for _ in range(repeats):
l = list(powerset(range(size)))
Upvotes: 2