How to get ( parse ) subchild in XML from python

Question

I am new to python or coding , so please be patient with my question,

So here's my busy XML

    

    999
    
        
        
            pass
          
        
            V
            
                
                    jack
                    smiths
                
                
                100 rodeo dr
                long beach
                ca
                90802
                
                
                    123456789
                    ca
                
                
                x@me.com
                
                    0000000000
                    1111111111
                
                
            
            Regular

what I am trying to do is extract these value into nice and clean table below :
enter image description here

Here's my code so far.. but I couldn't figure it out how to get the subchild :

   import os
os.chdir('d:/py/xml/')

import xml.etree.ElementTree as ET
tree = ET.parse('xxml.xml')
root=tree.getroot()
x = root.tag
y = root.attrib
print(x,y)

#---PRINT ALL NODES---
for child in root:
    print(child.tag, child.attrib)

Thank you in advance !

jfs · Accepted Answer

You could create a dictionary that maps the column names to xpath expressions that extract corresponding values e.g.:

xpath = {
  "ID": "/Total/ID/text()",
  "Check": "/Total/Response/Detail/Nix/Check/text()", # or "//Check/text()"
}

To populate the table row:

row = {name: tree.xpath(path) for name, path in xpath.items()}

The above assumes that you use lxml that support the full xpath syntax. ElementTree supports only a subset of XPath expressions but it might be enough in your case (you could remove "text()" expression and use el.text in this case) e.g.:

xpath = {
  "ID": ".//ID",
  "Check": ".//Check",
}
row = {name: tree.findtext(path) for name, path in xpath.items()}

To print all text with corresponding tag names:

import xml.etree.cElementTree as etree

for _, el in etree.iterparse("xxm.xml"):
    if el.text and not el: # leaf element with text
       print el.tag, el.text

If column names differ from tag names (as in your case) then the last example is not enough to build the table.

How to get ( parse ) subchild in XML from python

Answers (2)

Related Questions