I need to parse an XML file and build a record-based output from the data. The problem is that the XML is in a "generic" form, in that it has several levels of nested "node" elements that represent some sort of data structure. I need to build the records dynamically based on the deepest level of the "node" element. Some example XML and expected output are at the bottom. I am most familiar w/ python's ElementTree, so I'd prefer to use that but I just can't wrap my head around a way to dynamically build the output record based on a dynamic node depth. Also - we can't assume that the nested nodes will be x levels deep, so just hardcoding each level w/ a loop isn't possible. Is there a way to parse the XML and build the output on the fly? Some Additional Notes: The node names are all "node" except the parent and detail info (rate, price, etc) The node depth is not static. So - assume further levels than displayed in the sample Each "level" can have multiple sub-levels. So - you need to loop on each child "node" to properly build each record. Any ideas / input would be greatly appreciated. <root> <node>101 <node>A <node>PlanA <node>default <rate>100.00</rate> </node> <node>alternative <rate>90.00</rate> </node> </node> </node> </node> <node>102 <node>B <node>PlanZZ <node>Group 1 <node>default <rate>100.00</rate> </node> <node>alternative <rate>90.00</rate> </node> </node> <node>Group 2 <node>Suba <node>default <rate>1.00</rate> </node> <node>alternative <rate>88.00</rate> </node> </node> <node>Subb <node>default <rate>200.00</rate> </node> <node>alternative <rate>4.00</rate> </node> </node> </node> </node> </node> </node> </root> The Output would look like this: SRV SUB PLAN Group SubGrp DefRate AltRate 101 A PlanA 100 90 102 B PlanB Group1 100 90 102 B PlanB Group2 Suba 1 88 102 B PlanB Group2 Subb 200 4

Reputation: 13

Python XML - build flat record from dynamic nested "node" elements

I need to parse an XML file and build a record-based output from the data. The problem is that the XML is in a "generic" form, in that it has several levels of nested "node" elements that represent some sort of data structure. I need to build the records dynamically based on the deepest level of the "node" element. Some example XML and expected output are at the bottom.

I am most familiar w/ python's ElementTree, so I'd prefer to use that but I just can't wrap my head around a way to dynamically build the output record based on a dynamic node depth. Also - we can't assume that the nested nodes will be x levels deep, so just hardcoding each level w/ a loop isn't possible. Is there a way to parse the XML and build the output on the fly?

Some Additional Notes:

The node names are all "node" except the parent and detail info (rate, price, etc)
The node depth is not static. So - assume further levels than displayed in the sample
Each "level" can have multiple sub-levels. So - you need to loop on each child "node" to properly build each record.

Any ideas / input would be greatly appreciated.

<root>
   <node>101
      <node>A
         <node>PlanA     
            <node>default
                <rate>100.00</rate>
            </node>
            <node>alternative
                <rate>90.00</rate>
            </node>
         </node>
      </node>
   </node>
   <node>102
      <node>B
         <node>PlanZZ     
            <node>Group 1
               <node>default
                   <rate>100.00</rate>
               </node>
               <node>alternative
                   <rate>90.00</rate>
               </node>
            </node>
            <node>Group 2
               <node>Suba
                  <node>default
                      <rate>1.00</rate>
                  </node>
                      <node>alternative
                      <rate>88.00</rate>
                  </node>
               </node>
               <node>Subb
                  <node>default
                      <rate>200.00</rate>
                  </node>
                      <node>alternative
                      <rate>4.00</rate>
                  </node>
               </node>
            </node>
         </node>
      </node>  
   </node>
</root>

The Output would look like this:

SRV  SUB  PLAN   Group    SubGrp  DefRate   AltRate
101  A    PlanA                   100       90
102  B    PlanB  Group1           100       90
102  B    PlanB  Group2   Suba    1         88
102  B    PlanB  Group2   Subb    200       4

Upvotes: 1

Answers (2)

S.Lott

Reputation: 391846

That's why you have Element Tree find method with an XPath.

class Plan( object ):
    def __init__( self ):
        self.srv= None
        self.sub= None
        self.plan= None
        self.group= None
        self.subgroup= None
        self.defrate= None
        self.altrate= None
    def initFrom( self, other ):
        self.srv= other.srv
        self.sub= other.sub
        self.plan= other.plan
        self.group= other.group
        self.subgroup= other.subgroup
    def __str__( self ):
        return "%s %s %s %s %s %s %s" % (
            self.srv, self.sub, self.plan, self.group, self.subgroup,
            self.defrate, self.altrate )

def setRates( obj, aSearch ):
    for rate in aSearch:
        if rate.text.strip() == "default":
            obj.defrate= rate.find("rate").text.strip()
        elif rate.text.strip() == "alternative":
            obj.altrate= rate.find("rate").text.strip()
        else:
            raise Exception( "Unexpected Structure" )

def planIter( doc ):
    for topNode in doc.findall( "node" ):
        obj= Plan()
        obj.srv= topNode.text.strip()
        subNode= topNode.find("node")
        obj.sub= subNode.text.strip()
        planNode= topNode.find("node/node")
        obj.plan= planNode.text.strip()
        l3= topNode.find("node/node/node")
        if l3.text.strip() in ( "default", "alternative" ):
            setRates( obj, topNode.findall("node/node/node") )
            yield obj
        else:
            for group in topNode.findall("node/node/node"):
                grpObj= Plan()
                grpObj.initFrom( obj )
                grpObj.group= group.text.strip()
                l4= group.find( "node" )
                if l4.text.strip() in ( "default", "alternative" ):
                    setRates( grpObj, group.findall( "node" ) )
                    yield grpObj
                else:
                    for subgroup in group.findall("node"):
                        subgrpObj= Plan()
                        subgrpObj.initFrom( grpObj )
                        subgrpObj.subgroup= subgroup.text.strip()
                        setRates( subgrpObj, subgroup.findall("node") )
                        yield subgrpObj

import xml.etree.ElementTree as xml
doc = xml.XML( doc )

for plan in planIter( doc ):
    print plan

Edit

Whoever gave you this XML document needs to find another job. This is A Bad Thing (TM) and indicates a fairly casual disregard for what XML means.

Upvotes: 4

Jason Coon

Reputation: 18421

I'm not too familiar with the ElementTree module, but you should be able to use the getchildren() method on an element, and recursively parse data until there are no more children. This is more sudo-code than anything:

def parseXml(root, data):
    # INSERT CODE to populate your data object here with the values 
    # you want from this node
    sub_nodes = root.getchildren()
    for node in sub_nodes:
        parseXml(node, data)

data = {}  # I'm guessing you want a dict of some sort here to store the data you parse
parseXml(parse(file).getroot(), data)
# data will be filled and ready to use

Upvotes: 0

Python XML - build flat record from dynamic nested &quot;node&quot; elements

Answers (2)

Related Questions

Python XML - build flat record from dynamic nested "node" elements