user3522371
user3522371

Reputation:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 1: invalid continuation byte

I want to convert a byte variable to string. Of course, there are previous questions related to mine. However, trying to hash in md5() the content of a file this way:

import hashlib
with open("C:\\boot.ini","r") as f:
    r=f.read()
a=hashlib.md5()
a.update(r.encode('utf8'))
bytes_data=a.digest()
print(bytes_data)
r=type(bytes_data)
print(r) # <-- Just to be sure, it is in bytes 
myString=bytes_data.decode(encoding='UTF-8')

I got this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 1: invalid continuation byte

I understand the reason of my problem thanks to this question, however I am dealing with different files to calculate their hash, so I have no control on the bytes, so how can I resolve this problem ?

Upvotes: 5

Views: 7937

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124298

The hash.digest() return value is not a UTF-8-encoded string. Don't try to decode it; it is a sequence of bytes in the range 0-255 and these bytes do not represent text.

Not all bytes contents encode text; this is one such value.

Use hash.hexdigest() if you want something printable instead. This method returns the bytes expressed as hexadecimal numbers instead (two hex characters per digest byte). This is the commonly used form when sharing a MD5 digest with others.

Upvotes: 8

Related Questions