Partitioning non-unique entities into unique sets with Python

Question

I have a list of items, e.g. [1,1,1,1,2,2], and I am trying to find all the unique groups where these items are bundled into tuples of length one or two. For example, for the group above, I would like to find the following 10 possible groupings:

[[(1,),(1,),(1,),(1,),(2,),(2,)],
 [(1,1),(1,),(1,),(2,),(2,)],
 [(1,2),(1,),(1,),(1,),(2,)],
 [(2,2),(1,),(1,),(1,),(1,)],
 [(1,1),(1,1),(2,),(2,)],
 [(1,1),(1,2),(1,),(2,)],
 [(1,2),(1,2),(1,),(1,)],
 [(2,2),(1,1),(1,),(1,)],
 [(1,1),(1,1),(2,2)],
 [(1,1),(1,2),(1,2)]]

I've been playing with itertools to but can only manage to use it to find unique possible tuples (e.g. set(list(itertools.combinations((1,1,1,1,2,2),2)))) and any searches I do pop up solutions where the size of each group is constant and/or duplication of elements is not considered (example1, example2).

Ultimately, I am looking for a solution that will work for cases that are all ones ([1,1,1,...,1]), all twos ([2,2,2,...,2]) or some intermediate combination that includes an arbitrary number of 1s and 2s.

Tim Peters · Accepted Answer

As I noted in a comment, the largest length of the input list is crucial. Here's example code that solves the specific example you gave quickly enough, by post-processing the full set of partitions (to weed out duplicates, and to weed out partitions with pieces that are "too large"). But it would be horridly inefficient for "long" original lists:

def part(xs):  # generate all partitions of xs
    xs = tuple(xs)
    n = len(xs)
    def extend(i):
        if i == n:
            yield ()
            return
        this = xs[i]
        for o in extend(i+1):
            yield ((this,),) + o
            for j, p in enumerate(o):
                yield o[:j] + ((this,) + p,) + o[j+1:]
    for o in extend(0):
        yield o

def upart(xs):  # weed out dups, and partitions with a piece bigger than 2
    from collections import Counter
    seen = []
    for p in part(xs):
        if all(len(chunk) <= 2 for chunk in p):
            c = Counter(p)
            if c not in seen:
                seen.append(c)
                yield p

xs = [1,1,1,1,2,2]
for o in upart(xs):
    print o

That displays the 10 unique partitions you're looking for.

BTW, for xs = [1,1,1,1,1,1] it produces:

((1,), (1,), (1,), (1,), (1,), (1,))
((1, 1), (1,), (1,), (1,), (1,))
((1, 1), (1, 1), (1,), (1,))
((1, 1), (1, 1), (1, 1))

A Custom Generator

As also noted in the comment, if post-processing the results of general building blocks is too inefficient, you need to "roll your own" from scratch. Here's one way that's very space-efficient, building unique results by construction (rather than by post-processing). There's really no "general way" of doing this - it requires analyzing the specific problem at hand, and writing code to exploit any quirks you can find:

def custom_gen(xs):
    from collections import Counter
    assert all(1 <= i <= 2 for i in xs)
    # There are only 5 unique pieces that can be used:
    pieces = [(1,), (2,), (1, 1), (2, 2), (1, 2)]
    countpieces = {piece: Counter(piece) for piece in pieces}

    def extend(i, n1, n2, result):
        # try all ways of extending with pieces[i];
        # there are n1 1's and n2 2's remaining to be used
        assert n1 >= 0 and n2 >= 0
        if n1 == n2 == 0:
            yield result
            return
        if i == len(pieces):  # dead end
            return
        piece = pieces[i]
        c = countpieces[piece]
        p1 = c[1]
        p2 = c[2]
        # What's the most number of this piece we could
        # possibly take?
        assert p1 or p2
        if p1:
            if p2:
                most = min(n1 // p1, n2 // p2)
            else:
                most = n1 // p1
        else:
            most = n2 // p2
        for count in range(most + 1):
            for t in extend(i+1,
                            n1 - count * p1,
                            n2 - count * p2,
                            result + [piece] * count):
                yield t

    c = Counter(xs)
    for t in extend(0, c[1], c[2], []):
        yield t

Note that the recursion never goes more than 5 deep (no matter how long the input list), so I'd wager this is about the most efficient that can be done without deeper analysis of the mathematics of the problem.

Partitioning non-unique entities into unique sets with Python

Answers (1)

A Custom Generator

Related Questions