Reputation: 273
Trying to use the ElementTree to parse xml files; since by default the parser does not retain comments, used the following code from http://bugs.python.org/issue8277:
import xml.etree.ElementTree as etree
class CommentedTreeBuilder(etree.TreeBuilder):
"""A TreeBuilder subclass that retains comments."""
def comment(self, data):
self.start(etree.Comment, {})
self.data(data)
self.end(etree.Comment)
parser = etree.XMLParser(target = CommentedTreeBuilder())
The above is in documents.py. Tested with:
class TestDocument(unittest.TestCase):
def setUp(self):
filename = os.path.join(sys.path[0], "data", "facilities.xml")
self.doc = etree.parse(filename, parser = documents.parser)
def testClass(self):
print("Class is {0}.".format(self.doc.__class__.__name__))
#commented out tests.
if __name__ == '__main__':
unittest.main()
This barfs with:
Traceback (most recent call last):
File "/home/goncalo/documents/games/ja2/modding/mods/xml-overhaul/src/scripts/../tests/test_documents.py", line 24, in setUp
self.doc = etree.parse(filename, parser = documents.parser)
File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 1242, in parse
tree.parse(source, parser)
File "/usr/lib/python3.3/xml/etree/ElementTree.py", line 1726, in parse
parser.feed(data)
IndexError: pop from empty stack
What am I doing wrong? By the way, the xml in the file is valid (as checked by an independent program) and in utf-8 encoding.
note(s):
edit: here is the sample xml file used; it is very small (let's see if I can get the formatting right):
<?xml version="1.0" encoding="utf-8"?>
<!-- changes to facilities.xml by G. Rodrigues: ar overhaul.-->
<SECTORFACILITIES>
<!-- Drassen -->
<!-- Small airport -->
<FACILITY>
<SectorGrid>B13</SectorGrid>
<FacilityType>4</FacilityType>
<ubHidden>0</ubHidden>
</FACILITY>
</SECTORFACILITIES>
Upvotes: 4
Views: 2551
Reputation: 32620
The example XML you added works for me in 2.7, but breaks on 3.3 with the stack trace you described.
The problem seems to be the very first comment - after the XML declaration, before the first element. It isn't part of the tree in 2.7 (doesn't raise an Exception though), and causes the exception in 3.3.
See Python issue #17901: In Python 3.4, which contains the mentioned fix, pop from empty stack
doesn't occur, but ParseError: multiple elements on top level
is raised instead.
Which makes sense: If you want to retain the comments in the tree, they need to be trated as nodes. And XML only allows one node at the top level of the document, so you can't have a comment before the first "real" element (if you force the parser to retain commments).
So unfortunately I think that's your only option: Remove those comments outside the root document node from your XML files - either in the original files, or by stripping them before parsing.
Upvotes: 2