Reputation: 591
I have a file and want to convert it to utf8 encoding.
When I want to read, I receive this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte
My purpose was to read it and then convert it to utf8 encoding format, but it doesn't allow reading.
Here is my code:
#convert all files into utf_8 format
import os
import io
path_directory="some path string"
directory = os.fsencode(path_directory)
for file in os.listdir(directory):
file_name=os.fsdecode(file)
file_path_source=path_directory+file_name
file_path_dest="some address to destination file"
with open(file_path_source,"r") as f1:
text=f1.read()
with io.open(file_path_dest,"w+",encoding='utf8') as f2:
f2.write(text)
file_path=""
file_name=""
text=None
and the error is:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-47-59e5e52ddd40> in <module>()
10 with open(file_path,"r") as f1:
11 print(type(f1))
---> 12 text=f1.read()
13 with io.open(file_path.replace("ref_sum","ref_sum_utf_8"),"w+",encoding='utf8') as f2:
14 f2.write(text)
/home/afsharizadeh/anaconda3/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte
how can I convert my files to utf8 without reading them?
Upvotes: 1
Views: 6555
Reputation: 3984
That is obvious . If you want to open a file and its not utf8 for python3(utf8 is default encoding for python3 and ascii for python2) then you have to mention the encoding you know the file is in while opening it :
io.open(file_path_dest,"r",encoding='ISO-8859-1')
In this case encoding is ISO-8859-1 so you have to mention it.
Upvotes: 1