Reputation:
I want to convert a byte variable to string. Of course, there are previous questions related to mine. However, trying to hash in md5() the content of a file this way:
import hashlib
with open("C:\\boot.ini","r") as f:
r=f.read()
a=hashlib.md5()
a.update(r.encode('utf8'))
bytes_data=a.digest()
print(bytes_data)
r=type(bytes_data)
print(r) # <-- Just to be sure, it is in bytes
myString=bytes_data.decode(encoding='UTF-8')
I got this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 1: invalid continuation byte
I understand the reason of my problem thanks to this question, however I am dealing with different files to calculate their hash, so I have no control on the bytes, so how can I resolve this problem ?
Upvotes: 5
Views: 7937
Reputation: 1124298
The hash.digest()
return value is not a UTF-8-encoded string. Don't try to decode it; it is a sequence of bytes in the range 0-255 and these bytes do not represent text.
Not all bytes
contents encode text; this is one such value.
Use hash.hexdigest()
if you want something printable instead. This method returns the bytes expressed as hexadecimal numbers instead (two hex characters per digest byte). This is the commonly used form when sharing a MD5 digest with others.
Upvotes: 8