David542
David542

Reputation: 110422

Checksumming filepaths that may not be ascii

Let's say I have two filepaths:

/my/file/path.mov
/mé/fileé/pathé.mov

If I do something like:

{hashlib.md5(path).hexdigest() for path in paths}

Then I'll sometimes get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 217: ordinal not in range(128)

My quickfix was something along the lines of:

{hashlib.md5(path).hexdigest() for path in paths if path.isascii()}

But what would be a better way to deal with this?

Upvotes: 0

Views: 36

Answers (2)

Piero
Piero

Reputation: 686

The encoding that you have to give it is missing. utf -... followed by the number of the encode you want to use ...

Normally it should be fine like this:

hashlib.md5(path.encode("utf-8"))

Upvotes: 1

Silvio Mayolo
Silvio Mayolo

Reputation: 70347

You need to provide an encoding yourself. In full generality, you can use UTF-8.

hashlib.md5(path.encode("utf-8"))

Upvotes: 1

Related Questions