prashantgpt91
prashantgpt91

Reputation: 1795

how to parse xml with multiple root element

I need to parse both var & group root elements.

Code

import xml.etree.ElementTree as ET
tree_ownCloud = ET.parse('0020-syslog_rules.xml')
root = tree_ownCloud.getroot()

Error

xml.etree.ElementTree.ParseError: junk after document element: line 17, column 0

Sample XML

<var name="BAD_WORDS">core_dumped|failure|error|attack| bad |illegal |denied|refused|unauthorized|fatal|failed|Segmentation Fault|Corrupted</var>

<group name="syslog,errors,">
  <rule id="1001" level="2">
    <match>^Couldn't open /etc/securetty</match>
    <description>File missing. Root access unrestricted.</description>
    <group>pci_dss_10.2.4,gpg13_4.1,</group>
  </rule>

  <rule id="1002" level="2">
    <match>$BAD_WORDS</match>
    <options>alert_by_email</options>
    <description>Unknown problem somewhere in the system.</description>
    <group>gpg13_4.3,</group>
  </rule>
</group>

I tried following couple of other questions on stackoverflow here, but none helped.

I know the reason, due to which it is not getting parsed, people have usually tried hacks. IMO it's a very common usecase to have multiple root elements in XML, and something must be there in ET parsing library to get this done.

Upvotes: 1

Views: 3609

Answers (2)

Michael Kay
Michael Kay

Reputation: 163312

Actually, the example data is not a well-formed XML document, but it is a well-formed XML entity. Some XML parsers have an option to accept an entity rather than a document, and in XPath 3.1 you can parse this using the parse-xml-fragment() function.

Another way to parse a fragment like this is to create a wrapper document which references it as an external entity:

<!DOCTYPE wrapper [
<!ENTITY e SYSTEM "fragment.xml">
]>
<wrapper>&e;</wrapper>

and then supply this wrapper document as the input to your XML parser.

Upvotes: 3

Grzegorz Oledzki
Grzegorz Oledzki

Reputation: 24251

As mentioned in the comment, an XML file cannot have multiple roots. Simple as that.

If you do receive/store data in this format (and then it's not proper XML). You could consider a hack of surrounding what you have with a fake tag, e.g.

import xml.etree.ElementTree as ET

with open("0020-syslog_rules.xml", "r") as inputFile: 
  fileContent = inputFile.read()
  root = ET.fromstring("<fake>" + fileContent +"</fake>")
  print(root)

Upvotes: 5

Related Questions