Python throws ascii codec can't encode when parsing xml

Question

I was running the following code in Python:

import xml.etree.ElementTree as ET
tree = ET.parse('dplp_11.xml')
root = tree.getroot()
f = open('workfile', 'w')
for country in root.findall('article'):
    rank = country.find('year').text
    name = country.find('title').text

    if(int(rank)>2009):
        f.write(name)
        auth = country.findall('author')
        for a in auth:
            #print str(a)
            f.write(a.text)
            f.write(',')
        f.write('
')

I got an error:

Traceback (most recent call last):
  File "parser.py", line 14, in 
    f.write(a.text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 4: ordinal not in range(128)

I was trying to parse the dblp data which looks like this:




Sanjeev Saxena
Parallel Integer Sorting and Simulation Amongst CRCW Models.
607-619
1996
33
Acta Inf.
7
db/journals/acta/acta33.html#Saxena96
http://dx.doi.org/10.1007/BF03036466


Symeon Bozapalidis
Zoltán Fülöp 0001
George Rahonis
Equational weighted tree transformations.
29-52
2012
49
Acta Inf.
1
http://dx.doi.org/10.1007/s00236-011-0148-5
db/journals/acta/acta49.html#BozapalidisFR12

How can I resolve it?

Martijn Pieters · Accepted Answer

a.text is a Unicode object, but you are trying to write it to a plain Python 2 file object:

f.write(a.text)

The f.write() method only takes a byte string (type str), triggering an implicit encode to the ASCII codec, triggering your exception if the text can't be encoded as ASCII.

You'll either need to explicitly encode it with a codec that can encode your data, or use a io.open() file object that does the encoding for you.

Encoding explicitly to UTF-8 would work, for example:

f.write(a.text.encode('utf8'))

or use io.open() with an explicit encoding:

import io

# ...

f = io.open('workfile', 'w', encoding='utf8')

after which all calls to f.write() must be Unicode objects; prefix any literal strings with u:

for a in auth:
    f.write(a.text)
    f.write(u',')
f.write(u'
')

Python throws ascii codec can't encode when parsing xml

Answers (1)

Related Questions

Python throws ascii codec can&#39;t encode when parsing xml

Answers (1)

Related Questions

Python throws ascii codec can't encode when parsing xml