jdelange
jdelange

Reputation: 801

Python XML: ParseError: junk after document element

Trying to parse XML file into ElementTree:

>>> import xml.etree.cElementTree as ET
>>> tree = ET.ElementTree(file='D:\Temp\Slikvideo\JPEG\SV_4_1_mask\index.xml')

I get following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Anaconda2\lib\xml\etree\ElementTree.py", line 611, in __init__
    self.parse(file)
  File "<string>", line 38, in parse
ParseError: junk after document element: line 3, column 0

XML file starts like this:

<?xml version="1.0" encoding="UTF-8" ?>
<Version Writer="E:\d\src\Modules\SceneSerialization\src\mitkSceneIO.cpp" Revision="$Revision: 17055 $" FileVersion="1" />
<node UID="OBJECT_2016080819041580480127">
    <source UID="OBJECT_2016080819041550469454" />
    <data type="LabelSetImage" file="hfbaaa_Bolus.nrrd" />
    <properties file="sicaaa" />
</node>
<node UID="OBJECT_2016080819041512769572">
    <source UID="OBJECT_2016080819041598947781" />
    <data type="LabelSetImage" file="ifbaaa_Bolus.nrrd" />
    <properties file="ticaaa" />
</node>

followed by many more nodes.

I do not see any junk in line 3, column 0? I assume there must be another reason for the error.

The .xml file is generated by external software MITK so I assume that should be ok.

Working on Win 7, 64 bit, VS2015, Anaconda

Upvotes: 26

Views: 45830

Answers (3)

Martin Valgur
Martin Valgur

Reputation: 6302

As @Matthias Wiehl said, ElementTree expects only a single root node and is not well-formed XML, which should be fixed at its origin. As a workaround you can add a fake root node to the document.

import xml.etree.cElementTree as ET
import re

with open("index.xml") as f:
    xml = f.read()
tree = ET.fromstring(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>")

Upvotes: 40

Raja Sattiraju
Raja Sattiraju

Reputation: 1272

Try repairing the document like this. Close the version element at the end

<?xml version="1.0" encoding="UTF-8" ?>
<Version Writer="E:\d\src\Modules\SceneSerialization\src\mitkSceneIO.cpp" Revision="$Revision: 17055 $" FileVersion="1">
    <node UID="OBJECT_2016080819041580480127">
        <source UID="OBJECT_2016080819041550469454" />
        <data type="LabelSetImage" file="hfbaaa_Bolus.nrrd" />
        <properties file="sicaaa" />
    </node>
    <node UID="OBJECT_2016080819041512769572">
        <source UID="OBJECT_2016080819041598947781" />
        <data type="LabelSetImage" file="ifbaaa_Bolus.nrrd" />
        <properties file="ticaaa" />
    </node>
</Version>

Upvotes: 0

Matthias Wiehl
Matthias Wiehl

Reputation: 1998

The root node of your document (Version) is opened and closed on line 2. The parser does not expect any nodes after the root node. Solution is to remove the closing forward slash.

Upvotes: 3

Related Questions