What could cause this error : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 568: invalid start byte

Question

I'm very new to coding and python so i'm really confuse with this Error. Here's my code from an exercise where i need to find the most used word into a directory with multiples files

import pathlib

directory = pathlib.Path('/Users/k/files/Code/exo')

stats ={}

for path in directory.iterdir():
    file = open(str(path))
    text = file.read().lower()

    punctuation  = (";", ".")
    for mark in punctuation:
        text = text.replace(mark, "")


    for word in text.split():
        if word in stats:

            stats[word] = stats[word] + 1
        else:
            stats[word] = 1

most_used_word = None
score_max = 0
for word, score in stats.items():
    if score > score_max:
        score_max = score
        most_used_word = word

print(word,"The most used word is : ", score_max)

here's what i get

Traceback (most recent call last):
  File "test.py", line 9, in 
    text = file.read().lower()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 568: invalid start byte

What could cause this error ?

Cromo · Accepted Answer

Probably your file contain non-ascii characters, so you have to decode them in order to make the UnicodeDecodeError to disappear. You can try with reading in 'rb' mode, like this:

file = open(str(path), 'rb')

On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.

(From the docs)

What could cause this error : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 568: invalid start byte

Answers (1)

Related Questions

What could cause this error : UnicodeDecodeError: &#39;utf-8&#39; codec can&#39;t decode byte 0xff in position 568: invalid start byte

Answers (1)

Related Questions

What could cause this error : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 568: invalid start byte