Finding all possible bin combinations

Question

I have 100.000 observations with a variable age on the range of 18-80. I want to find X bins based on the age variable. The bin ranges must not overlap and should combined span the entire interval. For instance, with X = 4 one possible bin combination could be:

18-30
31-45
46-57
58-80

How can I find all possible bin combinations given a value X?

Edit: Prompted by @Wolf, here is another constraint that I was thinking of implementing myself. Each bin must hold at least 10 values for the age variable. That of course limits X so X <= 6.

I've tried to integrate this into the answer by @mkrieger1, but failed.

def bin_combinations(values, n):
    """
    Generate all possible combinations of splitting the values into n
    contiguous parts.

    >>> list(bin_combinations('abcd', 3))
    [['a', 'b', 'cd'], ['a', 'bc', 'd'], ['ab', 'c', 'd']]
    """

    for indices in combinations(range(1, len(values)), n - 1):
        li = list(indices)
        starts = [None] + li
        ends = li + [None]
        size = li[-1] - li[0]
        if size >= 10:
            yield [values[start:end] for start, end in zip(starts, ends)]

Finding all possible bin combinations

Answers (1)

Related Questions