Reputation: 71
I got UnicodeDecodeError when I loop line in file.
with open(somefile,'r') as f:
for line in f:
#do something
This happend when I use python 3.4. In general I have some files which contain some no UTF-8 chars. I want to parse file line by line and find line where problem apper and got exact index in line where such non utf-8 appeard. I have ready code for it but it works uner python 2.7.9 but under python 3.4 I got UnicodeDecodeError when for loop is executed. Any ideas???
Upvotes: 2
Views: 1367
Reputation: 168626
You need to open the file in binary mode and decode the lines one at a time. Try this:
with open('badutf.txt', 'rb') as f:
for i, line in enumerate(f,1):
try:
line.decode('utf-8')
except UnicodeDecodeError as e:
print ('Line: {}, Offset: {}, {}'.format(i, e.start, e.reason))
Here is the result I get in Python3:
Line: 16, Offset: 6, invalid start byte
Sure enough, line 16, position 6 is the bad byte.
Upvotes: 2