Creating a list of sequences such that every distinct pair is counted a number of times

Question

Having some issues coming up with an algorithm to create a list of sequences such that every distinct pair is counted at a repeat number of times, where every element in the sequence can only have 1 of 2 possible other elements in it.

To be more clear, if I have the grammar such that the alphabet is [A, B, C, D, E, F]:

A -> B, E

B -> C, D

C -> A, F

D -> F, B

E -> D, A

F -> E, C

means the only possible pairs I can/should have are [(A, B), (A, E), (B, C), (B, D), (C, A), (C, F), (D, F), (D, B), (E, D), (E, A), (F, E), (F, C)].

So for example, a sequence could look something like:

A, B, D, F, E, A, E, D, F, E, A, E, ...

(A, B), (B, D), (E, D) have all occurred once so far and (A, E), (E, A), (F, E), (D, F) are no longer usable (as we've used them twice). We continue this pattern until we have no more usable pairs. For instance, the next element in the above sequence must be D, since we've exhausted the option (E, A).

From that, I could want to get all sequences starting with A up to a length of 24 that follows the above grammar, such that every pair in the sequence is repeated exactly twice.

My initial idea was to just make a bunch of nodes, then perform a generic DFS/BFS on it to get every path while counting the number of pairs that I see while going down the tree, return when there's too many pairs and add to a list of paths when I've reached a depth of 24. Are there better ways to get this done? I feel like I'm overthinking it.

(Note: It could also be a variable depth, and I could start with any element in the alphabet.)

PM 2Ring · Accepted Answer

Here's a recursive generator that makes your sequences. It was even easier than I thought: no backtracking required. :) To save memory, we use a single dict counts to keep track of the counts of each pair. We increment the count before the recursive call, and reset it after the recursive call.

The sequence length needs to be 25 to get each of the 12 pairs exactly twice in a sequence. The total number of sequences starting with a given letter is 18954.

grammar = {
    'A': 'BE',
    'B': 'CD',
    'C': 'AF',
    'D': 'BF',
    'E': 'AD',
    'F': 'CE',
}

pairs = [k + c for k, v in grammar.items() for c in v]
counts = dict.fromkeys(pairs, 0)

maxcount = 2
total_len = 25

def sequences(seq):
    if len(seq) == total_len:
        yield seq
        return

    k = seq[-1]
    for c in grammar[k]:
        pair = k + c
        if counts[pair] < maxcount:
            counts[pair] += 1
            yield from sequences(seq + c)
            counts[pair] -= 1

for i, seq in enumerate(sequences('A'), 1):
    print(i, seq)

some output

1 ABCABCAEAEDBDBDFCFCFEDFEA
2 ABCABCAEAEDBDBDFCFEDFCFEA
3 ABCABCAEAEDBDBDFEDFCFCFEA
4 ABCABCAEAEDBDFCFCFEDBDFEA
5 ABCABCAEAEDBDFCFEDBDFCFEA
6 ABCABCAEAEDBDFEDBDFCFCFEA
7 ABCABCAEAEDFCFCFEDBDBDFEA
8 ABCABCAEAEDFCFEDBDBDFCFEA
9 ABCABCAEAEDFEDBDBDFCFCFEA
10 ABCABCAEDBDBDFCFCFEAEDFEA

To get a quick count of the sequences without printing them all, you can do

print(sum(1 for _ in sequences('A')))

That runs in under 2 seconds on my old 2GHz 32 bit machine, running Python 3.6.0

Note that increasing maxcount considerably increases the number of sequences. Eg, for total_len = 19, maxcount = 2, the total number of sequences is 20622, for total_len = 19, maxcount = 3, the total number of sequences is 169192.

Creating a list of sequences such that every distinct pair is counted a number of times

Answers (2)

Related Questions