Reputation: 16974
Input file : chars.csv :
4,,x,,2,,9.012,2,,,,
6,,y,,2,,12.01,±4,,,,
7,,z,,2,,14.01,_3,,,,
When I try to parse this file, I get this error even after specifying utf-8 encoding.
>>> f=open('chars.csv',encoding='utf-8')
>>> f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 36: invalid start byte
How to correct this error?
Version: Python 3.2.3
Upvotes: 1
Views: 1313
Reputation: 2036
This is not UTF-8 encoding. The UTF-8 encoding of ± is \xC2\xB1 and  is \xC2\x83. As RobertT suggested, try Latin-1:
f=open('chars.csv',encoding='latin-1')
Upvotes: 0
Reputation: 4570
Your input file is clearly not utf-8 encoded, so you have at least those options:
f=open('chars.csv', encoding='utf-8', errors='ignore')
if given file is mostly utf-8 and you don't care about some small data loss. For other errors
parameter values check manualUpvotes: 3