Pijes
Pijes

Reputation: 65

How to set characters replacement limit with itertools replacement?

I need some limit for the all characters. I want to have max 2 of the same characters in the 7 places. Can I do that?

I get what I want with itertools combination_with_replacement. But I need a limit for every character.

from itertools import combinations_with_replacement

def combine(arr, s): 
    return list(combinations_with_replacement(symbols, s)) 

symbols = "Ga1SRT2"
max_length = 7 
set = 7
print(combine(symbols, set))

Example. Now I get one of 'G', 'G', 'G', 'a', 'S', 'S', 'R' and I want to just get 2 G, not 3.

Upvotes: 4

Views: 217

Answers (1)

Jean-François Fabre
Jean-François Fabre

Reputation: 140276

probably not the most efficient because it computes then discards a lot of data but post-processing counting the elements and keeping only the ones where there are max 2 repeats work:

from itertools import combinations_with_replacement
import collections

def combine(arr, s):
    return [x for x in combinations_with_replacement(symbols, s) if max(collections.Counter(x).values()) <= 2]

The if max(collections.Counter(x).values()) <= 2 in the list comprehension just keeps x if x members don't repeat more than twice.

We could also reuse a python equivalent of combinations_with_replacement (which is probably provided as native code on most platforms) provided in the documentation and perform a small & cheap algorithm optimization (note that we still need the filtering with collections.Counter)

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC with tweak to avoid
    # to issue obvious data that is going to be filtered out
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        if r - i < 3:
            yield tuple(pool[i] for i in indices)

The last yield is done inconditionally in the original source. My proposal is to check if r - i < 3, as if it's greater, indices is guaranteed to contain the same index more than twice.

With your input, the standard combinations_with_replacement algorithm from itertools yields 1716 values that need filtering with element count.

With my modification, it only yields 1255 values, so there are 25% less data to post-process. The replacement algorithm being pure python and not C, it's not sure that it's faster, though.

Upvotes: 3

Related Questions