Reputation: 1271
I would like to generate a human-readable hash with customized properties -- e.g., a short string of specified length consisting entirely of upper case letters and digits excluding 0, 1, O, and I (to eliminate visual ambiguity):
"arbitrary string" --> "E3Y7UM8"
A 7-character string of the above form could take on over 34 billion unique values which, for my purposes, makes collisions extremely unlikely. Security is also not a major concern.
Is there an existing module or routine that implements something like the above? Alternatively, can someone suggest a straightforward algorithm?
Upvotes: 1
Views: 1348
Reputation: 90
The method you should be using has similarities with password one-way encryption. Of course since you are going for readable, a good password function is probably out of the question.
Here's what I would do:
Here's an example based on the above:
import base64 # base32 is a function in base64
import hashlib
email = "[email protected]"
md5 = hashlib.md5()
md5.update(email.encode('utf-8'))
hash_in_bytes = md5.digest()
result = base64.b32encode(hash_in_bytes)
print(result)
# Or you can remove the extra "=" at the end
result = result.strip(b'=')
Since it's a one-way function (hash), you obviously don't need to worry about reversing the process (you can't anyway). You can also replace any other characters you find non-readable with readable ones (I would go for lowercase versions of the characters, e.g. q instead of Q)
More about base32 here: https://docs.python.org/3/library/base64.html
Upvotes: 2
Reputation: 7426
You can simply truncate the beginning of an MD5sum algorithm. It should have approximately the same statistical properties than the whole string anyway:
import md5
m = md5.new()
m.update("arbitrary string")
print(m.hexdigest()[:7])
Same code with hashlib
module:
import hashlib
m = hashlib.md5()
m.update("arbitrary string")
print(m.hexdigest()[:7])
Upvotes: 2