Johnzzz
Johnzzz

Reputation: 119

elementTree cannot parse unicode xml

I have following xml:

<Earth>
 <country name="Česká republika" population="8900000">
    <capital>Praha1</capital>        
  </country>
</Earth>

But when I try to parse it fails with error:

 xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 20

my code:

 tree=etree.parse(input) # input -> file.xml

Upvotes: 0

Views: 3377

Answers (1)

javawizard
javawizard

Reputation: 1284

As arhimmel pointed out, the issue is likely an encoding issue. etree.parse allows passing file-like objects as well as paths, so you could try adding import codecs at the top of your code and then replacing input with codecs.open("file.xml", encoding="UTF-8").

Upvotes: 1

Related Questions