Dimitris Leventeas
Dimitris Leventeas

Reputation: 1702

Randomize chain from itertools

I am copying an example from python docs.

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

How could we randomize the order of the values that we get while the result of powerset remains lazily evaluated?

EDIT: The reason I want it is that I would like to compute the sum of the derived sets and stop as soon as I find two sets that have the same sum. If I am not mistaken, the problem is NP-complete.

Upvotes: 5

Views: 3510

Answers (4)

fraxel
fraxel

Reputation: 35269

This is a lazy, and random solution:

import random

def powerset(seq):
    n = 2**len(seq)
    used = set([])
    while len(used) < n:
        choice = random.randint(0, n - 1)
        if not (choice in used):
            used.add(choice)
            binary = bin(choice)[2:].zfill(len(seq))
            yield [i[1] for i in zip(binary, seq) if i[0] == '1']
            #or following line if > python 2.7:
            #yield itertools.compress(seq, binary)

print list(powerset([1,2,3]))
print list(powerset([1,2,3]))
#output:
[[3], [1], [2, 3], [], [1, 2], [2], [1, 3], [1, 2, 3]]
[[2, 3], [1, 3], [1], [1, 2, 3], [1, 2], [2], [3], []]

If you consider the combinations of [1, 2, 3] in binary:

n  123 

0  000
1  001
2  010
3  011
4  100
5  101
6  110
7  111

Each combination can be labelled with a unique binary identifier. And there are always 2**len(seq) combinations.... So:

  1. Randomly select an integer between, 0, and 2**len(seq) - 1.
  2. check we haven't used it before (if we have, draw again).
  3. convert it to binary.
  4. zip it with seq.
  5. if the zipped binary digits are '0' we exclude it from the output.

This is lazy, and will work for large seq.

Small caveat:

There can be a problem, but it probably does not matter to you. Towards the end of a sequence you could run into trouble with repeated redraws (which could consume some time). As the probability of drawing an already drawn number is number of successful draws / 2**len(seq), so on a given draw, g, the expected number of draws to find a unused new number is:

n / (n - g)
#where n = 2**len(seq)

Which is fine, provided: n is small, or for large n: g << n (either or both of these situations are very likely, so neither should be much of an issue). In fact, with large n you can dispense with used and the checking for repeats altogether, as the expected number of iterations until the first repetition approximates to n**0.5.

Upvotes: 2

Avaris
Avaris

Reputation: 36715

Here is another idea: Store the combinations generators and yield randomly until you consume all. This randomizes the order of set sizes also.

Edit: I assume you don't care about the order of elements in a single set, since you will be summing them. If you do, you can put a random.shuffle(next_value) before yield.

import itertools
import random

def random_powerset(l):
    combs = [itertools.combinations(l,i) for i in range(len(l)+1)]
    while combs:
        comb_index = random.choice(range(len(combs)))
        try:
            next_value = next(combs[comb_index])
            yield next_value
        except StopIteration:
            combs.pop(comb_index)

Output:

In : list(random_powerset(range(3)))
Out: [(0, 1), (0, 2), (0, 1, 2), (1, 2), (), (0,), (1,), (2,)]

In : list(random_powerset(range(3)))
Out: [(0, 1, 2), (0,), (), (0, 1), (1,), (0, 2), (1, 2), (2,)]

In : list(random_powerset(range(3)))
Out: [(0, 1), (0, 1, 2), (0, 2), (), (0,), (1,), (1, 2), (2,)]

In : list(random_powerset(range(3)))
Out: [(), (0,), (0, 1), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)]

In : list(random_powerset(range(3)))
Out: [(), (0, 1), (0,), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)]

In : list(random_powerset(range(3)))
Out: [(0, 1), (0,), (0, 2), (1, 2), (), (1,), (2,), (0, 1, 2)]

In : list(random_powerset(range(3)))
Out: [(), (0, 1, 2), (0,), (1,), (2,), (0, 1), (0, 2), (1, 2)]

Upvotes: 2

ben w
ben w

Reputation: 2535

It's possible to improve on Lattyware's solution somewhat if you go beyond itertools.chain:

def chain_random(iterables):
    iterables = list(iterables)
    icount = len(iterables)
    if icount == 0: return 
    while icount > 1:
        shuffle(iterables)
        try:
            yield iterables[-1].next()
        except StopIteration:
            iterables.pop()
            icount -= 1
    for element in iterables[0]:
        yield element

def random_powerset(iterable):
    s = list(iterable)
    lengths = list(range(len(s)+1))
    shuffle(lengths)
    return chain_random(combinations(s, r) for r in lengths if not shuffle(s))

Example output:

>>> list(random_powerset(range(3)))
[(), (2, 1, 0), (1, 0), (1, 2), (2,), (0, 2), (1,), (0,)]
>>> list(random_powerset(range(3)))
[(1, 0), (1, 2), (0, 2, 1), (2,), (), (0, 2), (0,), (1,)]
>>> list(random_powerset(range(3)))
[(0, 1), (), (0, 2), (0,), (1, 2), (2, 0, 1), (1,), (2,)]
>>> list(random_powerset(range(3)))
[(), (1, 2), (2,), (1, 0), (0,), (2, 0), (1,), (1, 0, 2)]
>>> list(random_powerset(range(3)))
[(0, 1), (), (2,), (0, 2), (1, 2), (1,), (1, 2, 0), (0,)]
>>> list(random_powerset(range(3)))
[(0, 2, 1), (0,), (), (2, 0), (1,), (2, 1), (2,), (0, 1)]

itertools is written in C, so chain_random will be slower than itertools.chain. But you get more randomization this way.

Upvotes: 1

Gareth Latty
Gareth Latty

Reputation: 89017

itertools.combinations() gives us results in a set order from the input. Given this, we can shuffle our input list to produce a random order of elements (obviously, there will be a lot less possible orders for the outcome).

def random_powerset(iterable):
     s = list(iterable)
     lengths = list(range(len(s)+1))
     shuffle(lengths)
     return chain.from_iterable(combinations(s, r) for r in lengths if not shuffle(s))

(This is a bit of an ugly hack - we know shuffle(s) will always return False, so we can add it as a condition to ensure it's run for each call of combinations().)

We pre-generate the list of lengths, so that we can shuffle that too.

It's not perfectly random (there will still be an order - all elements of length n will be clustered together, for example, and those elements will be in an order depending on the random order of the input), but there will be a fair amount of randomness, if that is enough for you.

Example output:

>>> list(random_powerset(range(3)))
[(), (2,), (0,), (1,), (2, 1), (2, 0), (1, 0), (1, 2, 0)]
>>> list(random_powerset(range(3)))
[(), (0, 1), (0, 2), (1, 2), (0, 1, 2), (2,), (0,), (1,)]
>>> list(random_powerset(range(3)))
[(0, 1, 2), (2,), (1,), (0,), (0, 2), (0, 1), (2, 1), ()]
>>> list(random_powerset(range(3)))
[(1, 2, 0), (0,), (2,), (1,), (), (0, 1), (0, 2), (1, 2)]
>>> list(random_powerset(range(3)))
[(), (2, 1), (2, 0), (1, 0), (0,), (2,), (1,), (2, 1, 0)]
>>> list(random_powerset(range(3)))
[(1, 0), (1, 2), (0, 2), (0, 2, 1), (), (1,), (0,), (2,)]

I think that's the best you can do without making it non-lazy.

Upvotes: 2

Related Questions