Reputation: 1702
I am copying an example from python docs.
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
How could we randomize the order of the values that we get while the result of powerset
remains lazily evaluated?
EDIT: The reason I want it is that I would like to compute the sum of the derived sets and stop as soon as I find two sets that have the same sum. If I am not mistaken, the problem is NP-complete.
Upvotes: 5
Views: 3510
Reputation: 35269
This is a lazy, and random solution:
import random
def powerset(seq):
n = 2**len(seq)
used = set([])
while len(used) < n:
choice = random.randint(0, n - 1)
if not (choice in used):
used.add(choice)
binary = bin(choice)[2:].zfill(len(seq))
yield [i[1] for i in zip(binary, seq) if i[0] == '1']
#or following line if > python 2.7:
#yield itertools.compress(seq, binary)
print list(powerset([1,2,3]))
print list(powerset([1,2,3]))
#output:
[[3], [1], [2, 3], [], [1, 2], [2], [1, 3], [1, 2, 3]]
[[2, 3], [1, 3], [1], [1, 2, 3], [1, 2], [2], [3], []]
If you consider the combinations of [1, 2, 3]
in binary:
n 123
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111
Each combination can be labelled with a unique binary identifier. And there are always 2**len(seq)
combinations.... So:
0
, and 2**len(seq) - 1
.seq
.'0'
we exclude it from the output.This is lazy, and will work for large seq
.
Small caveat:
There can be a problem, but it probably does not matter to you. Towards the end of a sequence you could run into trouble with repeated redraws (which could consume some time). As the probability of drawing an already drawn number is number of successful draws / 2**len(seq)
, so on a given draw, g
, the expected number of draws to find a unused new number is:
n / (n - g)
#where n = 2**len(seq)
Which is fine, provided: n
is small, or for large n
: g << n
(either or both of these situations are very likely, so neither should be much of an issue). In fact, with large n
you can dispense with used
and the checking for repeats altogether, as the expected number of iterations until the first repetition approximates to n**0.5
.
Upvotes: 2
Reputation: 36715
Here is another idea: Store the combinations generators and yield randomly until you consume all. This randomizes the order of set sizes also.
Edit: I assume you don't care about the order of elements in a single set, since you will be summing them. If you do, you can put a random.shuffle(next_value)
before yield.
import itertools
import random
def random_powerset(l):
combs = [itertools.combinations(l,i) for i in range(len(l)+1)]
while combs:
comb_index = random.choice(range(len(combs)))
try:
next_value = next(combs[comb_index])
yield next_value
except StopIteration:
combs.pop(comb_index)
Output:
In : list(random_powerset(range(3)))
Out: [(0, 1), (0, 2), (0, 1, 2), (1, 2), (), (0,), (1,), (2,)]
In : list(random_powerset(range(3)))
Out: [(0, 1, 2), (0,), (), (0, 1), (1,), (0, 2), (1, 2), (2,)]
In : list(random_powerset(range(3)))
Out: [(0, 1), (0, 1, 2), (0, 2), (), (0,), (1,), (1, 2), (2,)]
In : list(random_powerset(range(3)))
Out: [(), (0,), (0, 1), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)]
In : list(random_powerset(range(3)))
Out: [(), (0, 1), (0,), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)]
In : list(random_powerset(range(3)))
Out: [(0, 1), (0,), (0, 2), (1, 2), (), (1,), (2,), (0, 1, 2)]
In : list(random_powerset(range(3)))
Out: [(), (0, 1, 2), (0,), (1,), (2,), (0, 1), (0, 2), (1, 2)]
Upvotes: 2
Reputation: 2535
It's possible to improve on Lattyware's solution somewhat if you go beyond itertools.chain
:
def chain_random(iterables):
iterables = list(iterables)
icount = len(iterables)
if icount == 0: return
while icount > 1:
shuffle(iterables)
try:
yield iterables[-1].next()
except StopIteration:
iterables.pop()
icount -= 1
for element in iterables[0]:
yield element
def random_powerset(iterable):
s = list(iterable)
lengths = list(range(len(s)+1))
shuffle(lengths)
return chain_random(combinations(s, r) for r in lengths if not shuffle(s))
Example output:
>>> list(random_powerset(range(3)))
[(), (2, 1, 0), (1, 0), (1, 2), (2,), (0, 2), (1,), (0,)]
>>> list(random_powerset(range(3)))
[(1, 0), (1, 2), (0, 2, 1), (2,), (), (0, 2), (0,), (1,)]
>>> list(random_powerset(range(3)))
[(0, 1), (), (0, 2), (0,), (1, 2), (2, 0, 1), (1,), (2,)]
>>> list(random_powerset(range(3)))
[(), (1, 2), (2,), (1, 0), (0,), (2, 0), (1,), (1, 0, 2)]
>>> list(random_powerset(range(3)))
[(0, 1), (), (2,), (0, 2), (1, 2), (1,), (1, 2, 0), (0,)]
>>> list(random_powerset(range(3)))
[(0, 2, 1), (0,), (), (2, 0), (1,), (2, 1), (2,), (0, 1)]
itertools
is written in C, so chain_random
will be slower than itertools.chain
. But you get more randomization this way.
Upvotes: 1
Reputation: 89017
itertools.combinations()
gives us results in a set order from the input. Given this, we can shuffle our input list to produce a random order of elements (obviously, there will be a lot less possible orders for the outcome).
def random_powerset(iterable):
s = list(iterable)
lengths = list(range(len(s)+1))
shuffle(lengths)
return chain.from_iterable(combinations(s, r) for r in lengths if not shuffle(s))
(This is a bit of an ugly hack - we know shuffle(s)
will always return False
, so we can add it as a condition to ensure it's run for each call of combinations()
.)
We pre-generate the list of lengths, so that we can shuffle that too.
It's not perfectly random (there will still be an order - all elements of length n will be clustered together, for example, and those elements will be in an order depending on the random order of the input), but there will be a fair amount of randomness, if that is enough for you.
Example output:
>>> list(random_powerset(range(3)))
[(), (2,), (0,), (1,), (2, 1), (2, 0), (1, 0), (1, 2, 0)]
>>> list(random_powerset(range(3)))
[(), (0, 1), (0, 2), (1, 2), (0, 1, 2), (2,), (0,), (1,)]
>>> list(random_powerset(range(3)))
[(0, 1, 2), (2,), (1,), (0,), (0, 2), (0, 1), (2, 1), ()]
>>> list(random_powerset(range(3)))
[(1, 2, 0), (0,), (2,), (1,), (), (0, 1), (0, 2), (1, 2)]
>>> list(random_powerset(range(3)))
[(), (2, 1), (2, 0), (1, 0), (0,), (2,), (1,), (2, 1, 0)]
>>> list(random_powerset(range(3)))
[(1, 0), (1, 2), (0, 2), (0, 2, 1), (), (1,), (0,), (2,)]
I think that's the best you can do without making it non-lazy.
Upvotes: 2