Reputation: 253
I'm trying to find a way to generate all possible "patterns" of length N
out of a list of K
letters. I've looked at similar questions but they all seem to be asking about combinations, permutations, etc. which is not exactly what I'm after.
For example, let K = 3
and N = 2
. That is, I want all 2-letter "patterns" that can be made with the letters [A, B, C]. AA is one such pattern. AB is another. And those are the only two. BB and CC are the same as AA, it's just "a letter, and then the same letter." Similarly, BA, BC, AC, etc. are the same as AB, it's just "a letter, and then a different letter." So for this simple case, there are only two patterns, and in fact this illustrates why K must be less than or equal to N (adding additional letters to choose from doesn't change anything).
If instead, K = 3, N = 3, then the five possible patterns would be AAA, AAB, ABA, ABB, and ABC. Every other permutation of three letters has a pattern that is identical to one of those five.
If K = 2 and N = 3, then there are just four possible patterns: AAA, AAB, ABA, ABB. (ABC is no longer a valid choice because I only have two letters to choose from.)
Of course, these examples are trivial to do by hand - I'm trying to create code that will generate all possible patterns for larger values of N and K. This may be more of a pure mathematical question but ultimately I need a Python function that will produce these so I thought I'd try here first to see if anyone knows or can think of an efficient way to do this.
Upvotes: 4
Views: 729
Reputation: 5802
I think you could also view it as a kind of graph problem: The characters are nodes and N
is the rank/depth of the graph (not sure about the correct terminology here), so for two characters and a depth of 2, you'd have a simple graph with the edges A -> A and A -> B. There are some additional constraints though, and they only really show with somewhat higher values for K
and N
, e.g., setting both to 4:
Note that this is not a fully connected graph, and that seems to be the key: from the second B
you cannot proceed to D
, because that wouldn't result in any new pattern (e.g., ABBD
would be the same as ABBC
). However, the first C
is allowed to visit the last A
, as this adds new combinations. So the rule seems to be that from every node you can either proceed to a node with a character that has already been seen or one node with a new character.
You can clearly see how you get a result of 5 paths for K=3
and N=3
, but if you set K
to 2, the C
node is lost so you lose one path, resulting in 4.
You could either make the graph first and then find all paths with existing tools (networkx), or build your own algorithm. E.g., below is an algorithm that computes the possible children for each node and recursively constructs the paths on the fly.
from string import ascii_uppercase
K = 3
N = 3
def build_paths(paths, K, N):
tmp = []
chars = ascii_uppercase[:K]
if N <= 1:
return paths
for prefix in paths:
# for each possible path
for suffix in chars[:chars.index(prefix[-1])+2]:
tmp.append(prefix+suffix)
paths = build_paths(tmp, K, N-1)
return paths
result = build_paths(['A'], K, N)
print(f"{K=}; {N=}; {len(result)} possible patterns:\n")
print(*result, sep='\n')
In theory/if well implemented, this should also be fairly efficient, as you only build possible patterns.
Upvotes: 1
Reputation: 146
One of the comments, from @JacobRR, was very close to what we need. Each "pattern" actually corresponds to partitioning of set {1, 2, ..., N} into K subsets. Each subset (even empty!) corresponds to the positions where letter l_k should be placed (l_1 = A, l_2 = B etc.). There's a demo here. https://thewebdev.info/2021/10/28/how-to-generate-set-partitions-in-python
For example, in case K=3, N=3, the partitions would be
{1,2,3}, ∅, ∅
{1}, {2, 3}, ∅
{2}, {1, 3}, ∅
{3}, {1, 2}, ∅
{1}, {2}, {3}
and for K=2, N=3, it's
{1,2,3}, ∅
{1}, {2, 3}
{2}, {1, 3}
{3}, {1, 2}
corresponding exactly to the given examples.
This question is also relevant.
https://codereview.stackexchange.com/questions/1526/finding-all-k-subset-partitions
I also wrote my own naive implementation.
import copy
N = 3
K = 2
iter = min(N, K)
def partitions(S, K):
if K == 1:
return [[S]]
if len(S) == 0:
return [[]]
result = []
S_new = copy.copy(S)
first = S_new.pop(0)
if (K-1 <= len(S_new)):
p1 = partitions(S_new, K-1)
for p in p1:
p.append([first])
result.append(p)
if (K <= len(S_new)):
p2 = partitions(S_new, K)
for p in p2:
for idx in range(len(p)):
p_new = copy.deepcopy(p)
p_new[idx].append(first)
result.append(p_new)
return result
for idx in range(1, iter+1):
print(partitions([i for i in range(0, N)], idx))
Upvotes: 2
Reputation: 476
I think a solution could be an inductive approach. The idea is, that only the relation among the positions is relevant, i.e. if position n equals position m.
So inductive means: you start from one pattern: A
Adding another letter, say B
gives two possible patterns (equal, unequal).
Adding another letter, gives the possible pattern (equal to first or second position, if second position differs from first)
So starting from 'A', you can inductively generate all possible patterns.
This here is not very fast, but I think it solves the problem:
import itertools as it
def get_patterns(pattern, letter):
lst = [pattern+s for s in set(pattern)]
return lst +[pattern+letter]
letters = 'ABCDEFGH'
patterns = ['A']
mx = 3
for l in range(mx):
for p in it.chain.from_iterable(patterns[l:]):
letter = list(set(letters).difference(p))[0]
patterns.append(get_patterns(p, letter))
print(patterns)
Upvotes: 1