Paul Brodersen
Paul Brodersen

Reputation: 13041

Compute all ways to bin a series of integers into N bins, where each bin only contains contiguous numbers

I want find all possible ways to map a series of (contiguous) integers M = {0,1,2,...,m} to another series of integers N = {0,1,2,...,n} where m > n, subject to the constraint that only contiguous integers in M map to the same integer in N.

The following piece of python code comes close (start corresponds to the first element in M, stop-1 corresponds to the last element in M, and nbins corresponds to |N|):

import itertools
def find_bins(start, stop, nbins):
    if (nbins > 1):
        return list(list(itertools.product([range(start, ii)], find_bins(ii, stop, nbins-1))) for ii in range(start+1, stop-nbins+2))
    else:
        return [range(start, stop)]

E.g

In [20]: find_bins(start=0, stop=5, nbins=3)
Out[20]: 
[[([0], [([1], [2, 3, 4])]),
([0], [([1, 2], [3, 4])]),
([0], [([1, 2, 3], [4])])],
[([0, 1], [([2], [3, 4])]), 
([0, 1], [([2, 3], [4])])],
[([0, 1, 2], [([3], [4])])]]

However, as you can see the output is nested, and for the life of me, I cant find a way to properly amend the code without breaking it.

The desired output would look like this:

In [20]: find_bins(start=0, stop=5, nbins=3)
Out[20]: 
[[(0), (1), (2, 3, 4)],
[(0), (1, 2), (3, 4)],
[(0), (1, 2, 3), (4)],
[(0, 1), (2), (3, 4)], 
[(0, 1), (2, 3), (4)],
[(0, 1, 2), (3), (4)]]

Upvotes: 1

Views: 347

Answers (2)

Tim Peters
Tim Peters

Reputation: 70725

I suggest a different approach: a partitioning into n non-empty bins is uniquely determined by the n-1 distinct indices marking the boundaries between the bins, where the first marker is after the first element, and the final marker before the last element. itertools.combinations() can be used directly to generate all such index tuples, and then it's just a matter of using them as slice indices. Like so:

def find_nbins(start, stop, nbins):
    from itertools import combinations
    base = range(start, stop)
    nbase = len(base)
    for ixs in combinations(range(1, stop - start), nbins - 1):
        yield [tuple(base[lo: hi])
               for lo, hi in zip((0,) + ixs, ixs + (nbase,))]

Then, e.g.,

for x in find_nbins(0, 5, 3):
    print(x)

displays:

[(0,), (1,), (2, 3, 4)]
[(0,), (1, 2), (3, 4)]
[(0,), (1, 2, 3), (4,)]
[(0, 1), (2,), (3, 4)]
[(0, 1), (2, 3), (4,)]
[(0, 1, 2), (3,), (4,)]

EDIT: Making it into 2 problems

Just noting that there's a more general underlying problem here: generating the ways to break an arbitrary sequence into n non-empty bins. Then the specific question here is applying that to the sequence range(start, stop). I believe viewing it that way makes the code easier to understand, so here it is:

def gbins(seq, nbins):
    from itertools import combinations
    base = tuple(seq)
    nbase = len(base)
    for ixs in combinations(range(1, nbase), nbins - 1):
        yield [base[lo: hi]
               for lo, hi in zip((0,) + ixs, ixs + (nbase,))]

def find_nbins(start, stop, nbins):
    return gbins(range(start, stop), nbins)

Upvotes: 2

Paul Brodersen
Paul Brodersen

Reputation: 13041

This does what I want; I will gladly accept simpler, more elegant solutions:

def _split(start, stop, nbins):
    if (nbins > 1):
        out = []
        for ii in range(start+1, stop-nbins+2):
            iterator = itertools.product([range(start, ii)], _split(ii, stop, nbins-1))
            for item in iterator:
                out.append(item)
        return out
    else:
        return [range(start, stop)]

def _unpack(nested):
    unpacked = []
    if isinstance(nested, (list, tuple)):
        for item in nested:

            if isinstance(item, tuple):
                for subitem in item:
                    unpacked.extend(_unpack(subitem))

            elif isinstance(item, list):
                unpacked.append([_unpack(subitem) for subitem in item])

            elif isinstance(item, int):
                unpacked.append([item])

        return unpacked

    else: # integer
        return nested

def find_nbins(start, stop, nbins):
    nested = _split(start, stop, nbins)
    unpacked = [_unpack(item) for item in nested]
    return unpacked

Upvotes: 0

Related Questions