Reputation: 1275
I am trying a simple parsing of a file and get the error due to special characters:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
infile = 'finance.txt'
input = open(infile)
for line in input:
if line.startswith(u'▼'):
I get the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1718: ordinal not in range(128)
Solution?
Upvotes: 1
Views: 1743
Reputation: 85542
You need to provide the encoding. For example if it is utf-8
:
import io
with io.open(infile, encoding='utf-8') as fobj:
for line in fobj:
if line.startswith(u'▼'):
This works for Python 2 and 3. Per default Python 2 opens files assuming no encoding, i.e. reading the content will return byte strings. Therefore, you can read only ascii
characters. In Python 3 the default is what
locale.getpreferredencoding(False)
returns, in many cases utf-8
. The standard open()
in Python 2 does not allow to specify an encoding. Using io.open()
makes it future proof because you don't need to change your code when switching to Python 3.
In Python 3:
>>> io.open is open
True
Upvotes: 5
Reputation: 87134
Open your file with the correct encoding, for example if your file is UTF8 encoded with Python 3:
with open('finance.txt', encoding='utf8') as f:
for line in input:
if line.startswith(u'▼'):
# whatever
With Python 2 you can use io.open()
(also works in Python 3):
import io
with io.open('finance.txt', encoding='utf8') as f:
for line in input:
if line.startswith(u'▼'):
# whatever
Upvotes: 3