Reputation: 5923
I have 100.000 observations with a variable age
on the range of 18-80. I want to find X
bins based on the age
variable. The bin ranges must not overlap and should combined span the entire interval. For instance, with X = 4
one possible bin combination could be:
How can I find all possible bin combinations given a value X
?
Edit: Prompted by @Wolf, here is another constraint that I was thinking of implementing myself. Each bin must hold at least 10 values for the age
variable. That of course limits X
so X <= 6
.
I've tried to integrate this into the answer by @mkrieger1, but failed.
def bin_combinations(values, n):
"""
Generate all possible combinations of splitting the values into n
contiguous parts.
>>> list(bin_combinations('abcd', 3))
[['a', 'b', 'cd'], ['a', 'bc', 'd'], ['ab', 'c', 'd']]
"""
for indices in combinations(range(1, len(values)), n - 1):
li = list(indices)
starts = [None] + li
ends = li + [None]
size = li[-1] - li[0]
if size >= 10:
yield [values[start:end] for start, end in zip(starts, ends)]
Upvotes: 1
Views: 456
Reputation: 23256
Most appropriately, you find combinations by using the combinations
function from the itertools
standard library module.
from itertools import combinations
def bin_combinations(values, n):
"""
Generate all possible combinations of splitting the values into n
contiguous parts.
>>> list(bin_combinations('abcd', 3))
[['a', 'b', 'cd'], ['a', 'bc', 'd'], ['ab', 'c', 'd']]
"""
for indices in combinations(range(1, len(values)), n - 1):
starts = [None] + list(indices)
ends = list(indices) + [None]
yield [values[start:end] for start, end in zip(starts, ends)]
Upvotes: 1