Grant Petty
Grant Petty

Reputation: 1271

Customized hash function for Python

I would like to generate a human-readable hash with customized properties -- e.g., a short string of specified length consisting entirely of upper case letters and digits excluding 0, 1, O, and I (to eliminate visual ambiguity):

"arbitrary string"  -->  "E3Y7UM8"

A 7-character string of the above form could take on over 34 billion unique values which, for my purposes, makes collisions extremely unlikely. Security is also not a major concern.

Is there an existing module or routine that implements something like the above? Alternatively, can someone suggest a straightforward algorithm?

Upvotes: 1

Views: 1348

Answers (2)

gschizas
gschizas

Reputation: 90

The method you should be using has similarities with password one-way encryption. Of course since you are going for readable, a good password function is probably out of the question.

Here's what I would do:

  1. Take an MD5 hash of the email
  2. Convert base32 which already eliminates O and I
  3. Replace any non-readable characters with readable ones

Here's an example based on the above:

 import base64 # base32 is a function in base64
 import hashlib

 email = "[email protected]"

 md5 = hashlib.md5()
 md5.update(email.encode('utf-8'))

 hash_in_bytes = md5.digest()

 result = base64.b32encode(hash_in_bytes)

 print(result)

 # Or you can remove the extra "=" at the end

 result = result.strip(b'=')

Since it's a one-way function (hash), you obviously don't need to worry about reversing the process (you can't anyway). You can also replace any other characters you find non-readable with readable ones (I would go for lowercase versions of the characters, e.g. q instead of Q)

More about base32 here: https://docs.python.org/3/library/base64.html

Upvotes: 2

perror
perror

Reputation: 7426

You can simply truncate the beginning of an MD5sum algorithm. It should have approximately the same statistical properties than the whole string anyway:

import md5
m = md5.new()
m.update("arbitrary string")
print(m.hexdigest()[:7])

Same code with hashlib module:

import hashlib
m = hashlib.md5()
m.update("arbitrary string")
print(m.hexdigest()[:7])

Upvotes: 2

Related Questions