Whole Brain
Whole Brain

Reputation: 2167

Combinations with max length and per element max repetition values

My goal is to find a more efficient way to get all combinations of 1 to r mixed elements, where each family of element potentially has a different count and r is a parameter. The elements can be any (hashable) type. The result is a list of Counter-like dictionaries.

Here is an example data:

example = {1e-8: 3, "k": 2}
r = 5  # sum(example.values()) == 5 therefore all possible combinations for this example

The expected result is the following:

[{1e-08: 1},
 {'k': 1},
 {1e-08: 2},
 {1e-08: 1, 'k': 1},
 {'k': 2},
 {1e-08: 3},
 {1e-08: 2, 'k': 1},
 {1e-08: 1, 'k': 2},
 {1e-08: 3, 'k': 1},
 {1e-08: 2, 'k': 2},
 {1e-08: 3, 'k': 2}]

... correspondong to every possible combinations of 1, 2, 3, 4 and 5 elements.

The order preservation of the list is preferable (since Python 3.7+ preserves the order of keys inside dictionaries) but not mandatory.

Here is the solution I currently use:

from more_itertools import distinct_combinations
from collections import Counter

def all_combis(elements, r=None):

    if r is None:
        r = sum(elements.values())

    # "Flattening" by repeating the elements according to their count
    flatt = []
    for k, v in elements.items():
        flatt.extend([k] * v)

    for r in range(1, r+1):
        for comb in distinct_combinations(flatt, r):
            yield dict(Counter(comb))

list(all_combis(example))
# > The expected result

A real-life example has 300 elements distributed among 15 families. It is processed in ~13 seconds with a value of r=10 for about 2 million combinations, and ~31 seconds with r=11 for 4.5 million combinations.

I'm guessing there are better ways which avoid "flattening" the elements and/or counting the combinations, but I struggle to find any when each element has a different count.

Can you design a more time-efficient solution ?

Upvotes: 1

Views: 271

Answers (2)

big_bad_bison
big_bad_bison

Reputation: 1015

As @John Coleman said without the keys you may be able to speed things up.

This recursive approach starts at the end of the list and iterates until either the max sum is reached, or the max value of that element.

It returns a list, but as @John Coleman also showed, it is easy to add the keys later.

From my tests it appears to run in about half the time as your current implementation.

def all_combis(elements, r=None):
    if r is None:
        r = sum(elements)
    if r == 0:
        yield [0] * len(elements)
        return
    if not elements:
        yield []
        return

    elements = list(elements)
    element = elements.pop(0)
    for i in range(min(element + 1, r + 1)):
        for combi in all_combis(elements, r - i):
            yield [i] + combi

example = {1e-8: 3, "k": 2}
list(all_combis([val for val in example.values()]))

Output:

[[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2], [3, 0], [3, 1], [3, 2]]

Upvotes: 0

John Coleman
John Coleman

Reputation: 52008

The keys are a bit of a distraction. They can be added in later. Mathematically, what you have is a vector of bounds, together with a global bound, and want to generate all tuples where each element is bounded by its respective bound, and the total is bounded by the global bound. This leads to a simple recursive approach based on the idea that if

(a_1, a_2, ..., a_n) <= (b_1, b_2, ..., b_n) with a_1 + ... a_n <= k

then

(a_2, ..., a_n) <= (b_2, ..., b_n) with a_2 + ... a_n <= k - a_1

This leads to something like:

def bounded_tuples(r,bounds):
    n = len(bounds)
    if r == 0:
        return [(0,)*n]
    elif n == 0:
        return [()]
    else:
        tuples = []
        for i in range(1+min(r,bounds[0])):
            tuples.extend((i,)+t for t in bounded_tuples(r-i,bounds[1:]))
        return tuples          

Note that this includes the solution with all 0's -- which you exclude, but that can be filtered out and the keys reintroduced:

def all_combis(elements, r=None):
    if r is None:
        r = sum(elements.values())
    for t in bounded_tuples(r,list(elements.values())):
        if max(t) > 0:
            yield dict(zip(elements.keys(),t))

For example:

example = {1e-8: 3, "k": 2}
for d in all_combis(example):
    print(d)

Output:

{1e-08: 0, 'k': 1}
{1e-08: 0, 'k': 2}
{1e-08: 1, 'k': 0}
{1e-08: 1, 'k': 1}
{1e-08: 1, 'k': 2}
{1e-08: 2, 'k': 0}
{1e-08: 2, 'k': 1}
{1e-08: 2, 'k': 2}
{1e-08: 3, 'k': 0}
{1e-08: 3, 'k': 1}
{1e-08: 3, 'k': 2}

Which is essentially what you have. The code could obviously be tweaked to eliminate dictionary entries with the value 0.

Timing with larger examples seems to suggest that my approach isn't any quicker than yours, though it still might give you some ideas.

Upvotes: 1

Related Questions