Suman
Suman

Reputation: 55

UnicodeDecodeError in python 3.5 when trying to open text files

I am trying to open some configuration files with following command:

f=open(os.path.join(root, name),mode='rt',errors='ignore')

However, I am getting the following error after upgrading to python 3.5.


for line in f:
  File "C:\python35-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 57: chara
cter maps to <undefined>

This code worked fine when, I ran using python 2.7. I have tried to specify encoding as utf8 or latin1 but none of them are working now. It would be very much helpful if anyone can suggest me a way forward?

It will be ok if I can ignore the error and go to the next line. How can I skip the erroneous part?

Upvotes: 0

Views: 1026

Answers (2)

user6037143
user6037143

Reputation: 566

You can use codecs.open

import codecs
f = codecs.open(os.path.join(root, name), mode='rt', encoding='utf-8')
for line in f:
    #do something

Also, I don't think the problem is with your code but rather with Windows command prompt as its encoding is 'cp1252'. I'd run into this issue long back. Basically, if you run your script on Windows command prompt and as soon as your code executes the print function (to print the unicode data) the program would crash since Windows command prompt is unable to decode and print it.

You can also get around this problem by printing the raw data. That is, change all print function to print("%r" % line)

Upvotes: 0

Sergey Gornostaev
Sergey Gornostaev

Reputation: 7787

Try to specify encoding of file open(os.path.join(root, name), encoding='utf-8')

Upvotes: 1

Related Questions