Parse XML file with namespace with Python

Question

I have a complex xml I'm trying to extract data from.



    
        
            
            
            
            
            
            
        
        
            
                
                    
                        
                            
                            
                        
    ... and so on and so on

The file has multiple blocks starting and ending with the Save and /Save blocks and the info I'm looking for can be as far as the label, or even farther.

ElementTree.Iter seemed to be my solution as it would iterate through every Save block and find the info I am looking for, but unfortunately, it doesn't accept a namespace argument.

What are my other options? I'm trying to keep my code flexible, as I foresee that the structure of the xml file could change in the future, and simple so I would rather not implement something like:

tree= ET.parse('dblank.xml')
root = tree.getroot()
for i in range(len(root)):
        Array[i]=root[i][1][0][0][0][0][0].text

Valdi_Bo · Accepted Answer

When you process XML with namespaces, you must specify the namespaces used. To this end I:

defined ns variable (a dictionary) with namespace shortcuts as keys and full namespaces as values (a single dictionary entry here),
used this variable as the second argument in findall.

Note also that the first argument of findall contains some: as the initial part of the element name.

Try the following code:

import xml.etree.ElementTree as et

tree = et.parse('Input.xml')
root = tree.getroot()
ns = {'some': 'something.something.com'}

for elem in root.findall('.//some:label', ns):
    print(elem.text)

Of course, this is only an example of how to refer to an existing element. Change it according to your needs.

Parse XML file with namespace with Python

Answers (1)

Related Questions