trialcritic
trialcritic

Reputation: 1275

Python, UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1718: ordinal not in range(128)

I am trying a simple parsing of a file and get the error due to special characters:

#!/usr/bin/env python                                                                                                                 
# -*- coding: utf-8 -*-                                                                                                               

infile = 'finance.txt'
input = open(infile)
for line in input:
  if line.startswith(u'▼'):

I get the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1718: ordinal not in range(128)

Solution?

Upvotes: 1

Views: 1743

Answers (2)

Mike Müller
Mike Müller

Reputation: 85542

You need to provide the encoding. For example if it is utf-8:

import io

with io.open(infile, encoding='utf-8') as fobj:
    for line in fobj:
        if line.startswith(u'▼'):

This works for Python 2 and 3. Per default Python 2 opens files assuming no encoding, i.e. reading the content will return byte strings. Therefore, you can read only ascii characters. In Python 3 the default is what locale.getpreferredencoding(False) returns, in many cases utf-8. The standard open() in Python 2 does not allow to specify an encoding. Using io.open() makes it future proof because you don't need to change your code when switching to Python 3.

In Python 3:

>>> io.open is open
True

Upvotes: 5

mhawke
mhawke

Reputation: 87134

Open your file with the correct encoding, for example if your file is UTF8 encoded with Python 3:

with open('finance.txt', encoding='utf8') as f:
    for line in input:
        if line.startswith(u'▼'):
            # whatever

With Python 2 you can use io.open() (also works in Python 3):

import io

with io.open('finance.txt', encoding='utf8') as f:
    for line in input:
        if line.startswith(u'▼'):
            # whatever

Upvotes: 3

Related Questions