Mahsa
Mahsa

Reputation: 591

I cannot read a file because I receive "UnicodeDecodeError: 'utf-8' codec can't decode" error

I have a file and want to convert it to utf8 encoding.

When I want to read, I receive this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte

My purpose was to read it and then convert it to utf8 encoding format, but it doesn't allow reading.

Here is my code:

#convert all files into utf_8 format
import os
import io
path_directory="some path string"
directory = os.fsencode(path_directory)
for file in os.listdir(directory):
    file_name=os.fsdecode(file)
    file_path_source=path_directory+file_name
    file_path_dest="some address to destination file"
    with open(file_path_source,"r") as f1:
        text=f1.read()
    with io.open(file_path_dest,"w+",encoding='utf8') as f2:
        f2.write(text)
    file_path=""
    file_name=""
    text=None

and the error is:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-47-59e5e52ddd40> in <module>()
     10     with open(file_path,"r") as f1:
     11         print(type(f1))
---> 12         text=f1.read()
     13     with io.open(file_path.replace("ref_sum","ref_sum_utf_8"),"w+",encoding='utf8') as f2:
     14         f2.write(text)

/home/afsharizadeh/anaconda3/lib/python3.6/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte

how can I convert my files to utf8 without reading them?

Upvotes: 1

Views: 6555

Answers (1)

0decimal0
0decimal0

Reputation: 3984

That is obvious . If you want to open a file and its not utf8 for python3(utf8 is default encoding for python3 and ascii for python2) then you have to mention the encoding you know the file is in while opening it :

io.open(file_path_dest,"r",encoding='ISO-8859-1')

In this case encoding is ISO-8859-1 so you have to mention it.

Upvotes: 1

Related Questions