pir
pir

Reputation: 5923

Finding all possible bin combinations

I have 100.000 observations with a variable age on the range of 18-80. I want to find X bins based on the age variable. The bin ranges must not overlap and should combined span the entire interval. For instance, with X = 4 one possible bin combination could be:

How can I find all possible bin combinations given a value X?

Edit: Prompted by @Wolf, here is another constraint that I was thinking of implementing myself. Each bin must hold at least 10 values for the age variable. That of course limits X so X <= 6.

I've tried to integrate this into the answer by @mkrieger1, but failed.

def bin_combinations(values, n):
    """
    Generate all possible combinations of splitting the values into n
    contiguous parts.

    >>> list(bin_combinations('abcd', 3))
    [['a', 'b', 'cd'], ['a', 'bc', 'd'], ['ab', 'c', 'd']]
    """

    for indices in combinations(range(1, len(values)), n - 1):
        li = list(indices)
        starts = [None] + li
        ends = li + [None]
        size = li[-1] - li[0]
        if size >= 10:
            yield [values[start:end] for start, end in zip(starts, ends)]

Upvotes: 1

Views: 456

Answers (1)

mkrieger1
mkrieger1

Reputation: 23256

Most appropriately, you find combinations by using the combinations function from the itertools standard library module.

from itertools import combinations

def bin_combinations(values, n):
    """
    Generate all possible combinations of splitting the values into n
    contiguous parts.

    >>> list(bin_combinations('abcd', 3))
    [['a', 'b', 'cd'], ['a', 'bc', 'd'], ['ab', 'c', 'd']]
    """
    for indices in combinations(range(1, len(values)), n - 1):
        starts = [None] + list(indices)
        ends = list(indices) + [None]
        yield [values[start:end] for start, end in zip(starts, ends)]

Upvotes: 1

Related Questions