lincr
lincr

Reputation: 1653

Generating unambiguous fixed-length circular bit sequence

I'm doing a study in which a kind of unambiguous fixed-length-unit bit sequence generating algorithm is used. Suppose we use k bits, for simplicity, let's say k=3, and we can convert integer 0 ~ 7 to 000~111, named as code units. Now I'd like to get a sequence s, in which:

(i). s is a circular bit sequence, i.e. s_1, s_2 ... s_n =(functions as) s_2, s_3 ... s_n, s_1

(ii). s is constructed by concating all code units exactly once with fixed-length overlapping. Every bit in s contributes to exactly k original code units. So the length of s is 2^k

(iii). With s known, given any continuous k 0-1 bits(1 code unit), we can uniquely locate that code unit in s.

I write a program to find s under k=3:

# -*- coding: utf-8 -*-

import math
from itertools import permutations
import numpy


def is_legal_seq(seq, k):
    seen_set = set()
    len_seq = len(seq)
    seq = seq + seq[:k]
    for i in range(len_seq):
        cu = seq[i:i+k]
        if cu in seen_set:
            return False
        else:
            seen_set.add(cu)
    if len(seen_set) != int(math.pow(2, k)):
         return False
    return True


k = 3
n_cu = int(math.pow(2, k))

root_str = (n_cu//2 - k) * '0' + n_cu // 2 * '1'
per_strs = list(set("".join(p) for p in permutations(root_str)))
per_strs = ["0" * k + i for i in per_strs]

# should equal to scipy.comb(n-k, n/2-k)
print(f"count permutations: {len(per_strs)}")

for i in range(len(per_strs)):
    if is_legal_seq(per_strs[i], k):
        print(f"satisfying: {per_strs[i]}")

Output:

count permutations: 5
satisfying: 00011101
satisfying: 00010111

Take 00011101 as an example, it can be interpreted as code units unambiguously in the order: 000, 001, 011, 111, 110, 101, 010, 100. And more importantly, there are no repetitions. Each possible code unit appears exactly once.

Here I used a fixed k prefix '0' to avoid removing circular duplicates. And I find I need to test Combination(2^k-k, 2^k/2-k) possible sequences to decide which one is legal.

But when changed to k=6, it seems the programs needs much more time to output(2.2e16 times tests). How can I find s in other ways?

Upvotes: 0

Views: 55

Answers (1)

MBo
MBo

Reputation: 80287

Seems you want to build de Brujin sequence over 0/1 alphabet.

Python code from above link with 2/3 output similar to your 00010111:

def de_bruijn(k, n: int) -> str:
    """
    de Bruijn sequence for alphabet k
    and subsequences of length n.
    """
    try:
        # let's see if k can be cast to an integer;
        # if so, make our alphabet a list
        _ = int(k)
        alphabet = list(map(str, range(k)))

    except (ValueError, TypeError):
        alphabet = k
        k = len(k)

    a = [0] * k * n
    sequence = []

    def db(t, p):
        if t > n:
            if n % p == 0:
                sequence.extend(a[1:p + 1])
        else:
            a[t] = a[t - p]
            db(t + 1, p)
            for j in range(a[t - p] + 1, k):
                a[t] = j
                db(t + 1, t)
    db(1, 1)
    return "".join(alphabet[i] for i in sequence)

print(de_bruijn(2, 3))
print(de_bruijn("abcd", 2))
which prints

00010111
aabacadbbcbdccdd

Upvotes: 1

Related Questions