Reputation: 13
I need to parse an XML file and build a record-based output from the data. The problem is that the XML is in a "generic" form, in that it has several levels of nested "node" elements that represent some sort of data structure. I need to build the records dynamically based on the deepest level of the "node" element. Some example XML and expected output are at the bottom.
I am most familiar w/ python's ElementTree, so I'd prefer to use that but I just can't wrap my head around a way to dynamically build the output record based on a dynamic node depth. Also - we can't assume that the nested nodes will be x levels deep, so just hardcoding each level w/ a loop isn't possible. Is there a way to parse the XML and build the output on the fly?
Some Additional Notes:
Any ideas / input would be greatly appreciated.
<root>
<node>101
<node>A
<node>PlanA
<node>default
<rate>100.00</rate>
</node>
<node>alternative
<rate>90.00</rate>
</node>
</node>
</node>
</node>
<node>102
<node>B
<node>PlanZZ
<node>Group 1
<node>default
<rate>100.00</rate>
</node>
<node>alternative
<rate>90.00</rate>
</node>
</node>
<node>Group 2
<node>Suba
<node>default
<rate>1.00</rate>
</node>
<node>alternative
<rate>88.00</rate>
</node>
</node>
<node>Subb
<node>default
<rate>200.00</rate>
</node>
<node>alternative
<rate>4.00</rate>
</node>
</node>
</node>
</node>
</node>
</node>
</root>
The Output would look like this:
SRV SUB PLAN Group SubGrp DefRate AltRate
101 A PlanA 100 90
102 B PlanB Group1 100 90
102 B PlanB Group2 Suba 1 88
102 B PlanB Group2 Subb 200 4
Upvotes: 1
Views: 2367
Reputation: 391846
That's why you have Element Tree find
method with an XPath.
class Plan( object ):
def __init__( self ):
self.srv= None
self.sub= None
self.plan= None
self.group= None
self.subgroup= None
self.defrate= None
self.altrate= None
def initFrom( self, other ):
self.srv= other.srv
self.sub= other.sub
self.plan= other.plan
self.group= other.group
self.subgroup= other.subgroup
def __str__( self ):
return "%s %s %s %s %s %s %s" % (
self.srv, self.sub, self.plan, self.group, self.subgroup,
self.defrate, self.altrate )
def setRates( obj, aSearch ):
for rate in aSearch:
if rate.text.strip() == "default":
obj.defrate= rate.find("rate").text.strip()
elif rate.text.strip() == "alternative":
obj.altrate= rate.find("rate").text.strip()
else:
raise Exception( "Unexpected Structure" )
def planIter( doc ):
for topNode in doc.findall( "node" ):
obj= Plan()
obj.srv= topNode.text.strip()
subNode= topNode.find("node")
obj.sub= subNode.text.strip()
planNode= topNode.find("node/node")
obj.plan= planNode.text.strip()
l3= topNode.find("node/node/node")
if l3.text.strip() in ( "default", "alternative" ):
setRates( obj, topNode.findall("node/node/node") )
yield obj
else:
for group in topNode.findall("node/node/node"):
grpObj= Plan()
grpObj.initFrom( obj )
grpObj.group= group.text.strip()
l4= group.find( "node" )
if l4.text.strip() in ( "default", "alternative" ):
setRates( grpObj, group.findall( "node" ) )
yield grpObj
else:
for subgroup in group.findall("node"):
subgrpObj= Plan()
subgrpObj.initFrom( grpObj )
subgrpObj.subgroup= subgroup.text.strip()
setRates( subgrpObj, subgroup.findall("node") )
yield subgrpObj
import xml.etree.ElementTree as xml
doc = xml.XML( doc )
for plan in planIter( doc ):
print plan
Edit
Whoever gave you this XML document needs to find another job. This is A Bad Thing (TM) and indicates a fairly casual disregard for what XML means.
Upvotes: 4
Reputation: 18421
I'm not too familiar with the ElementTree
module, but you should be able to use the getchildren()
method on an element, and recursively parse data until there are no more children. This is more sudo-code than anything:
def parseXml(root, data):
# INSERT CODE to populate your data object here with the values
# you want from this node
sub_nodes = root.getchildren()
for node in sub_nodes:
parseXml(node, data)
data = {} # I'm guessing you want a dict of some sort here to store the data you parse
parseXml(parse(file).getroot(), data)
# data will be filled and ready to use
Upvotes: 0