HotFuzz
HotFuzz

Reputation: 39

Parse XML file with namespace with Python

I have a complex xml I'm trying to extract data from.

<?xml version="1.0" ?>
<root xmlns="something.something.com">
    <Save>
        <AdditionalInfo>
            <Name></Name>
            <Time></Time>
            <UtilityVersion></UtilityVersion>
            <XMLVersion></XMLVersion>
            <PluginName></PluginName>
            <ClassName></ClassName>
        </AdditionalInfo>
        <Data>
            <session>
                <xyDataObjects>
                    <xyData Key="'info'" ObjectType="moreinfo" Type="evenmoreinfo">
                        <axis1QuantityType ObjectType="guesswhat" Type="info!">
                            <label></label>
                            <type></type>
                        </axis1QuantityType>
    ... and so on and so on

The file has multiple blocks starting and ending with the Save and /Save blocks and the info I'm looking for can be as far as the label, or even farther.

ElementTree.Iter seemed to be my solution as it would iterate through every Save block and find the <label> info I am looking for, but unfortunately, it doesn't accept a namespace argument.

What are my other options? I'm trying to keep my code flexible, as I foresee that the structure of the xml file could change in the future, and simple so I would rather not implement something like:

tree= ET.parse('dblank.xml')
root = tree.getroot()
for i in range(len(root)):
        Array[i]=root[i][1][0][0][0][0][0].text

Upvotes: 1

Views: 2442

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

When you process XML with namespaces, you must specify the namespaces used. To this end I:

  • defined ns variable (a dictionary) with namespace shortcuts as keys and full namespaces as values (a single dictionary entry here),
  • used this variable as the second argument in findall.

Note also that the first argument of findall contains some: as the initial part of the element name.

Try the following code:

import xml.etree.ElementTree as et

tree = et.parse('Input.xml')
root = tree.getroot()
ns = {'some': 'something.something.com'}

for elem in root.findall('.//some:label', ns):
    print(elem.text)

Of course, this is only an example of how to refer to an existing element. Change it according to your needs.

Upvotes: 2

Related Questions