Reputation: 527
Let's say you have a string:
mystring = "Welcome to the InterStar cafe, serving you since 2412!"
I am looking for a way to convert that string into a number, like say:
encoded_string = number_encode(mystring)
print(encoded_string)
08713091353153848093820430298
..that you can convert back to the original string.
decoded_string = number_decode(encoded_string)
print(decoded_string)
"Welcome to the InterStar cafe, serving you since 2412!"
It doesn't have to be cryptographically secure, but it does have to put out the same number for the same string regardless of what computer it's running on.
Upvotes: 4
Views: 15187
Reputation: 830
I think the other answers are better than this one, but purely mathematically, there is an obvious way of doing this. You just have to interpret a message as an integer written in another base system with different symbols
def frombase(s, sym):
b = len(sym)
n = 0
bl = 1
for a in reversed(s):
n += sym.index(a) * bl
bl *= b
return n
def tobase(n, sym):
b = len(sym)
s = ''
while n > 0:
kl = n % b
n //= b
s += sym[kl]
return s[::-1] if s else sym[0]
and then for your specific case
symbols = [
' ', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D',
'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
'Y', 'Z', ',', '.', '?', '!', '-', ':', ';',
'_', '"', "'", '#', '$', '%', '&', '/', '(', ')',
'=', '+', '*', '<', '>', '~'
]
encodeword = lambda w: frombase(w, symbols)
decodeword = lambda n: tobase(n, symbols)
Though the first symbol (" ") will be omitted if there's nothing in front of it, similarly to 0001 = 1.
If you really want to represent all possible symbols, you can write them as a sequence of their ord
values (integers), seperated by the ,
symbol. Then you encode that in the base with an added ,
symbol:
symbols = [',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] # , is zero
txt2int = lambda w: encodeword(','.join(str(ord(x)) for x in w))
int2txt = lambda n: ''.join(chr(int(x)) for x in decodeword(n).split(','))
Regarding the size of the returned integer: txt2int(w) = encodeword(w) = O(len(w))
, meaning e.g. 'Hi there!' would encode to a number with about 9 digits.
Upvotes: 1
Reputation: 2401
If you are simply looking for making a certain string unreadable by a human you might use base64
, base64.b64encode(s, altchars=None)
and base64.b64decode(s, altchars=None, validate=False)
:
Take into account that it requires bytes-like object, so you should start your strings with b"I am a bytes-like string":
>>> import base64
>>> coded = base64.b64encode(b"Welcome to the InterStar cafe, serving you since 2412!")
>>> print(coded)
b'V2VsY29tZSB0byB0aGUgSW50ZXJTdGFyIGNhZmUsIHNlcnZpbmcgeW91IHNpbmNlIDI0MTIh'
>>> print(base64.b64decode(coded))
b"Welcome to the InterStar cafe, serving you since 2412!"
If you already have your strings, you can convert them with str.encode('utf-8')
:
>>> myString = "Welcome to the InterStar cafe, serving you since 2412!"
>>> bString = myString.encode('utf-8')
>>> print(bString)
b'Welcome to the InterStar cafe, serving you since 2412!'
>>> print(bString.decode())
'Welcome to the InterStar cafe, serving you since 2412!'
If you really need to convert the string to only numbers, you would have to use @ShadowRanger's answer.
Upvotes: 1
Reputation: 155497
encode
it to a bytes
in a fixed encoding, then convert the bytes
to an int
with int.from_bytes
. The reverse operation is to call .to_bytes
on the resulting int
, then decode
back to str
:
mystring = "Welcome to the InterStar cafe, serving you since 2412!"
mybytes = mystring.encode('utf-8')
myint = int.from_bytes(mybytes, 'little')
print(myint)
recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little')
recoveredstring = recoveredbytes.decode('utf-8')
print(recoveredstring)
This has one flaw, which is that if the string ends in NUL
characters ('\0'
/\x00'
) you'll lose them (switching to 'big'
byte order would lose them from the front). If that's a problem, you can always just pad with a '\x01'
explicitly and remove it on the decode side so there are no trailing 0s to lose:
mystring = "Welcome to the InterStar cafe, serving you since 2412!"
mybytes = mystring.encode('utf-8') + b'\x01' # Pad with 1 to preserve trailing zeroes
myint = int.from_bytes(mybytes, 'little')
print(myint)
recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little')
recoveredstring = recoveredbytes[:-1].decode('utf-8') # Strip pad before decoding
print(recoveredstring)
Upvotes: 13