Reputation: 1653
I'm doing a study in which a kind of unambiguous fixed-length-unit bit sequence generating algorithm is used. Suppose we use k
bits, for simplicity, let's say k=3
, and we can convert integer 0 ~ 7
to 000~111
, named as code units
.
Now I'd like to get a sequence s
, in which:
(i). s
is a circular bit sequence, i.e. s_1, s_2 ... s_n =(functions as) s_2, s_3 ... s_n, s_1
(ii). s
is constructed by concating all code units
exactly once with fixed-length overlapping. Every bit in s
contributes to exactly k
original code units. So the length of s
is 2^k
(iii). With s
known, given any continuous k
0-1 bits(1 code unit), we can uniquely locate that code unit in s
.
I write a program to find s
under k=3
:
# -*- coding: utf-8 -*-
import math
from itertools import permutations
import numpy
def is_legal_seq(seq, k):
seen_set = set()
len_seq = len(seq)
seq = seq + seq[:k]
for i in range(len_seq):
cu = seq[i:i+k]
if cu in seen_set:
return False
else:
seen_set.add(cu)
if len(seen_set) != int(math.pow(2, k)):
return False
return True
k = 3
n_cu = int(math.pow(2, k))
root_str = (n_cu//2 - k) * '0' + n_cu // 2 * '1'
per_strs = list(set("".join(p) for p in permutations(root_str)))
per_strs = ["0" * k + i for i in per_strs]
# should equal to scipy.comb(n-k, n/2-k)
print(f"count permutations: {len(per_strs)}")
for i in range(len(per_strs)):
if is_legal_seq(per_strs[i], k):
print(f"satisfying: {per_strs[i]}")
Output:
count permutations: 5
satisfying: 00011101
satisfying: 00010111
Take 00011101
as an example, it can be interpreted as code units unambiguously in the order:
000, 001, 011, 111, 110, 101, 010, 100
.
And more importantly, there are no repetitions. Each possible code unit appears exactly once.
Here I used a fixed k
prefix '0' to avoid removing circular duplicates. And I find I need to test Combination(2^k-k, 2^k/2-k)
possible sequences to decide which one is legal.
But when changed to k=6
, it seems the programs needs much more time to output(2.2e16 times tests).
How can I find s
in other ways?
Upvotes: 0
Views: 55
Reputation: 80287
Seems you want to build de Brujin sequence over 0/1 alphabet.
Python code from above link with 2/3 output similar to your 00010111:
def de_bruijn(k, n: int) -> str:
"""
de Bruijn sequence for alphabet k
and subsequences of length n.
"""
try:
# let's see if k can be cast to an integer;
# if so, make our alphabet a list
_ = int(k)
alphabet = list(map(str, range(k)))
except (ValueError, TypeError):
alphabet = k
k = len(k)
a = [0] * k * n
sequence = []
def db(t, p):
if t > n:
if n % p == 0:
sequence.extend(a[1:p + 1])
else:
a[t] = a[t - p]
db(t + 1, p)
for j in range(a[t - p] + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
return "".join(alphabet[i] for i in sequence)
print(de_bruijn(2, 3))
print(de_bruijn("abcd", 2))
which prints
00010111
aabacadbbcbdccdd
Upvotes: 1