Reputation: 5183
I am new to python or coding , so please be patient with my question,
So here's my busy XML
<?xml version="1.0" encoding="utf-8"?>
<Total>
<ID>999</ID>
<Response>
<Detail>
<Nix>
<Check>pass</Check>
</Nix>
<MaxSegment>
<Status>V</Status>
<Input>
<Name>
<First>jack</First>
<Last>smiths</Last>
</Name>
<Address>
<StreetAddress1>100 rodeo dr</StreetAddress1>
<City>long beach</City>
<State>ca</State>
<ZipCode>90802</ZipCode>
</Address>
<DriverLicense>
<Number>123456789</Number>
<State>ca</State>
</DriverLicense>
<Contact>
<Email>[email protected]</Email>
<Phones>
<Home>0000000000</Home>
<Work>1111111111</Work>
</Phones>
</Contact>
</Input>
<Type>Regular</Type>
</MaxSegment>
</Detail>
</Response>
</Total>
what I am trying to do is extract these value into nice and clean table below :
Here's my code so far.. but I couldn't figure it out how to get the subchild :
import os
os.chdir('d:/py/xml/')
import xml.etree.ElementTree as ET
tree = ET.parse('xxml.xml')
root=tree.getroot()
x = root.tag
y = root.attrib
print(x,y)
#---PRINT ALL NODES---
for child in root:
print(child.tag, child.attrib)
Thank you in advance !
Upvotes: 1
Views: 2852
Reputation: 414795
You could create a dictionary that maps the column names to xpath expressions that extract corresponding values e.g.:
xpath = {
"ID": "/Total/ID/text()",
"Check": "/Total/Response/Detail/Nix/Check/text()", # or "//Check/text()"
}
To populate the table row:
row = {name: tree.xpath(path) for name, path in xpath.items()}
The above assumes that you use lxml
that support the full xpath syntax. ElementTree supports only a subset of XPath expressions but it might be enough in your case (you could remove "text()" expression and use el.text
in this case) e.g.:
xpath = {
"ID": ".//ID",
"Check": ".//Check",
}
row = {name: tree.findtext(path) for name, path in xpath.items()}
To print all text with corresponding tag names:
import xml.etree.cElementTree as etree
for _, el in etree.iterparse("xxm.xml"):
if el.text and not el: # leaf element with text
print el.tag, el.text
If column names differ from tag names (as in your case) then the last example is not enough to build the table.
Upvotes: 2
Reputation: 15345
This is how you could traverse the tree and print only the text nodes:
def traverse(node):
show = True
for c in node.getchildren():
show = False
traverse(c)
if show:
print node.tag, node.text
for you example I get the following:
traverse(root)
ID 999
Check pass
Status V
First jack
Last smiths
StreetAddress1 100 rodeo dr
City long beach
State ca
ZipCode 90802
Number 123456789
State ca
Email [email protected]
Home 0000000000
Work 1111111111
Type Regular
Instead of printing out you could store (node.tag, node.text)
tuples or store {node.tag: node.text}
in a dict.
Upvotes: 2