Why do I need to declare encoding before hashing in Python, and how can I do this?

Question

I am trying to create an AI-like chatbot, and one of its features is a login. I have used the login code before and it works fine, but I am now encountering difficulties with the code dealing with the hashing of the passwords. Here's the code:

import hashlib
...
register = input ("Are you a new user? (y/n) >")

password_file = 'passwords.txt'
if register.lower() == "y": 
    newusername = input ("What do you want your username to be? >")
    newpassword = input ("What do you want your password to be? >")

    newpassword = hashlib.sha224(newpassword).hexdigest()

    file = open(password_file, "a")
    file.write("%s,%s
" % (newusername, newpassword))
    file.close()

elif register.lower() == ("n"):
    username = input ("What is your username? >")
    password = input ("What is your password? >")

    password = hashlib.sha224(password).hexdigest()

    print ("Loading...")
    with open(password_file) as f:
        for line in f:
            real_username, real_password = line.strip('
').split(',')
            if username == real_username and password == real_password:
                success = True
                print ("Login successful!")
              #Put stuff here! KKC
    if not success:
        print("Incorrect login details.")

And here's the result I'm getting:

Traceback (most recent call last):
  File "/main.py", line 36, in 
    newpassword = hashlib.sha224(newpassword).hexdigest()
TypeError: Unicode-objects must be encoded before hashing

I have looked up the encoding I think I should be using (latin-1) and found the required syntax, added that in and I still receive the same result.

Martijn Pieters · Accepted Answer

Hashing works on bytes. str objects contain Unicode text, not bytes, so you must encode first. Pick an encoding that a) can handle all codepoints you are likely to encounter, and perhaps b) other systems that produce the same hashes also use.

If you are the only user of the hashes, then just pick UTF-8; it can handle all of Unicode and is most efficient for western texts:

newpassword = hashlib.sha224(newpassword.encode('utf8')).hexdigest()

The return value from hash.hexdigest() is a Unicode str value, so you are safe to compare that with the str values you read from your file.

Why do I need to declare encoding before hashing in Python, and how can I do this?

Answers (1)

Related Questions