Reputation: 1009
Is there a method that converts a string of text such as 'you'
to a number other than
y = tuple('you')
for k in y:
k = ord(k)
which only converts one character at a time?
Upvotes: 9
Views: 20636
Reputation: 2621
Though there are a number of ways to fulfill this task, I prefer the hashing way because it has the following nice properties
import hashlib
# there are a number of hashing functions you can pick, and they provide tags of different lengths and security levels.
hashing_func = hashlib.md5
# the lambda func does three things
# 1. hash a given string using the given algorithm
# 2. retrive its hex hash tag
# 3. convert hex to integer
str2int = lambda s : int(hashing_func(s.encode()).hexdigest(), 16)
To see how the resulting integers are uniform randomly distributed, we first need to have some random string generator
import string
import numpy as np
# candidate characters
letters = string.ascii_letters
# total number of candidates
L = len(letters)
# control the seed or prng for reproducible results
prng = np.random.RandomState(1234)
# define the string prng of length 10
prng_string = lambda : "".join([letters[k] for k in prng.randint(0, L, size=(10))])
Now we generate sufficient number of random strings and obtain corresponding integers
ss = [prng_string() for x in range(50000)]
vv = np.array([str2int(s) for s in ss])
Let us check the randomness by comparing the theoretical mean and standard deviation of a uniform distribution and those we observed.
for max_num in [256, 512, 1024, 4096] :
ints = vv % max_num
print("distribution comparsions for max_num = {:4d} \n\t[theoretical] {:7.2f} +/- {:8.3f} | [observed] {:7.2f} +/- {:8.3f}".format(
max_num, max_num/2., np.sqrt(max_num**2/12), np.mean(ints), np.std(ints)))
Finally, you will see the results below, which indicates that the number you got are very uniform.
distribution comparsions for max_num = 256
[theoretical] 128.00 +/- 73.901 | [observed] 127.21 +/- 73.755
distribution comparsions for max_num = 512
[theoretical] 256.00 +/- 147.802 | [observed] 254.90 +/- 147.557
distribution comparsions for max_num = 1024
[theoretical] 512.00 +/- 295.603 | [observed] 512.02 +/- 296.519
distribution comparsions for max_num = 4096
[theoretical] 2048.00 +/- 1182.413 | [observed] 2048.67 +/- 1181.422
It is worthy to call out that other posted answers may not attain these these properties.
For example, @poke's convertToNumber
solution will give
distribution comparsions for max_num = 256
[theoretical] 128.00 +/- 73.901 | [observed] 93.48 +/- 17.663
distribution comparsions for max_num = 512
[theoretical] 256.00 +/- 147.802 | [observed] 220.71 +/- 129.261
distribution comparsions for max_num = 1024
[theoretical] 512.00 +/- 295.603 | [observed] 477.67 +/- 277.651
distribution comparsions for max_num = 4096
[theoretical] 2048.00 +/- 1182.413 | [observed] 1816.51 +/- 1059.643
Upvotes: 2
Reputation: 367
I was trying to find a way to convert a numpy character array into a unique numeric array in order to do some other stuff. I have implemented the following functions including the answers by @poke and @falsetrue (these methods were giving me some trouble when the strings were too large). I have also added the hash method (a hash is a fixed sized integer that identifies a particular value.)
import numpy as np
def str_to_num(x):
"""Converts a string into a unique concatenated UNICODE representation
Args:
x (string): input string
Raises:
ValueError: x must be a string
"""
if isinstance(x, str):
x = [str(ord(c)) for c in x]
x = int(''.join(x))
else:
raise ValueError('x must be a string.')
return x
def chr_to_num(x):
return int.from_bytes(x.encode(), 'little')
def char_arr_to_num(arr, type = 'hash'):
"""Converts a character array into a unique hash representation.
Args:
arr (np.array): numpy character array.
"""
if type == 'unicode':
vec_fun = np.vectorize(str_to_num)
elif type == 'byte':
vec_fun = np.vectorize(chr_to_num)
elif type == 'hash':
vec_fun = np.vectorize(hash)
out = np.apply_along_axis(vec_fun, 0, arr)
out = out.astype(float)
return out
a = np.array([['x', 'y', 'w'], ['x', 'z','p'], ['y', 'z', 'w'], ['x', 'w','y'], ['w', 'z', 'q']])
char_arr_to_num(a, type = 'unicode')
char_arr_to_num(a, type = 'byte')
char_arr_to_num(a, type = 'hash')
Upvotes: 1
Reputation: 531918
Treat the string as a base-255 number.
# Reverse the digits to make reconstructing the string more efficient
digits = reversed(ord(b) for b in y.encode())
n = reduce(lambda x, y: x*255 + y, digits)
new_y = ""
while n > 0:
n, b = divmod(n, 255)
new_y += chr(b)
assert y == new_y.decode()
(Note this is essentially the same as poke's answer, but written explicitly rather than using available methods for converting between a byte string and an integer.)
Upvotes: 3
Reputation: 388113
In order to convert a string to a number (and the reverse), you should first always work with bytes
. Since you are using Python 3, strings are actually Unicode strings and as such may contain characters that have a ord()
value higher than 255. bytes
however just have a single byte per character; so you should always convert between those two types first.
So basically, you are looking for a way to convert a bytes
string (which is basically a list of bytes, a list of numbers 0–255) into a single number, and the inverse. You can use int.to_bytes
and int.from_bytes
for that:
import math
def convertToNumber (s):
return int.from_bytes(s.encode(), 'little')
def convertFromNumber (n):
return n.to_bytes(math.ceil(n.bit_length() / 8), 'little').decode()
>>> convertToNumber('foo bar baz')
147948829660780569073512294
>>> x = _
>>> convertFromNumber(x)
'foo bar baz'
Upvotes: 32
Reputation: 369324
k
is overwritten. Collect items using something like list comprehension:>>> text = 'you'
>>> [ord(ch) for ch in text]
[121, 111, 117]
To get the text back, use chr
, and join the characters using str.join
:
>>> numbers = [ord(ch) for ch in text]
>>> ''.join(chr(n) for n in numbers)
'you'
Upvotes: 2