Reputation: 19
So, i want to create program, that will fastly return index of string in permutations list:
a
b
c
aa
ab
ac
ba
bb
bc
ca
cb
cc
aaa
aab
aac
aba
abb
abc
I use something like this to generate this list:
from itertools import product
charset = 'abc'
max_len = 3
for i in range(1, max_len + 1):
for combo in list(product(charset, repeat=i)):
print(''.join(combo))
So, it will be 6 for 'ac', 23 for 'bab', 'caa' for 31, etc.
I copied some code from other questions, but it doesn't really works fine
import math
class Page:
@staticmethod
def from_index(index: int, chars: str) -> str:
perm = str()
for i in range(math.ceil(index / len(chars)) - 1, -1, -1):
perm += chars[int((index / (len(chars) ** i)) % len(chars))]
return perm
class Index:
@staticmethod
def from_page(perm: str, chars: str) -> int:
ids = [chars.find(x) + 1 for x in perm]
base = (len(chars) ** len(perm)) / len(perm)
index = 0
for i in range(len(perm)):
index += base * ids[i]
base /= len(perm)
return int(index)
p.s. I dont want to generate whole list, but use algorithm to get values. It should work fast :)
Upvotes: 0
Views: 121
Reputation: 19
I found answer for my question. Numeral system with base of charset len is something that I wanted to create, it works in such way.
from functools import lru_cache
from modules.config import charset
@lru_cache
def encode(n):
try:
return charset[n]
except IndexError:
raise Exception(f"Cannot encode {n}")
@lru_cache
def decode(s):
try:
return charset.index(s)
except ValueError:
raise Exception(f"Cannot decode {s}")
@lru_cache
def dec_to_base(dec=0, base=16):
if dec < base:
return encode(dec)
else:
return dec_to_base(dec // base, base) + encode(dec % base)
@lru_cache
def base_to_dec(s, base=16, rec_pow=0):
if s == str():
return 0
else:
return decode(s[-1]) * (base ** rec_pow) + base_to_dec(s[0:-1], base, rec_pow + 1)
class Page:
@staticmethod
def from_index(index: int) -> str:
return dec_to_base(index, len(charset))
class Index:
@staticmethod
def from_page(page: str) -> int:
return base_to_dec(page, len(charset))
config.py
import json
config = json.load(open('config.json'))
charset = ''.join(sorted(list(set(' ' + config['charset']))))
It returns ' abc'
, with leading space (space is zero)
print(Index.from_page('abcabcabc')) # 112347
print(Page.from_index(112347)) # abcabcabc
It works in way I really needed!
Upvotes: 1
Reputation: 43467
You haven't really properly defined what you want in you question, but from your examples it seems like you want to find the index of a string in an ordered list of cartesian products of lengths from 1 to max_len
, for a given charset.
You can do this without building the list by counting the contributions of each character from left to right. Note that for a given length x
, there are len(charset)^x
elements. You can use this to progressively build up the answer.
So, it will be 6 for 'ac', 23 for 'bab', 'caa' for 31, etc.
First of all, both the length of the charset and the max_len
are 3
, so we have 3^1
possibilities for length 1 elements, 3^2
for length 2 elements and 3^3
for length 3 elements.
ac
: since we have a length 2
string, we already skipped all length 1
strings, of which there are 3
.
A c
on the third position means we skipped 2
, so we are the the 6
th.
bab
: first of all, length 3 means we skipped all length 1s and 2s so we count from 3+3^2=12
(leftmost) b
means we have skipped all that start with a
and continue with any of the 3^2=9
length 2
elements.
a
in the second means we haven't skipped any for the second
b
in the last contributes 1
So we are at all of the above +1
: 12+9+1+1=23
.
Similarly for caa
.
And so on. Python code:
def find_index(string, max_len, charset):
str_len = len(string)
charset_len = len(charset)
index = 1 + sum(charset_len ** i for i in range(1, str_len))
for i, c in enumerate(string):
prev_pow = charset_len**(str_len - i - 1)
contrib = charset.find(c) * prev_pow
index += contrib
return index
print(find_index('caa', 3, 'abc'))
You can probably further optimize this by not calling **
at each step, but it's quite fast as it is.
Similar reasoning should allow you to go from an index to the string at that position.
Upvotes: 0