Pythonizer
Pythonizer

Reputation: 1194

XML subtree parsing

I have to parse an XML file using lxml or even xml.etree.ElementTree modules

<?xml version="1.0"?>
<corners>
  <version>1.05</version>
  <process>
    <name>ss649</name>
    <statistics>
      <statistic name="Min" forparameter="modname" isextreme="no" style="tbld">
        <value>0.00073</value>
        <real_value>7.300e-10</real_value>
      </statistic>
      <statistic name="Max" forparameter="modname" isextreme="no" style="tbld">
        <value>0.32420</value>
        <real_value>3.242e-07</real_value>
     </statistic>
     <variant>
          <name>Unit</name>
          <value>
            <value>Size</value>
            <statistics>
              <statistic name="Min" forparameter="modname1" isextreme="no" style="tbld">
                <value>0.02090</value>
                <real_value>2.090e-08</real_value>
              </statistic>
              <statistic name="Max" forparameter="modname2" isextreme="no" style="tbld">
                <value>0.02090</value>
                <real_value>2.090e-08</real_value>
              </statistic>
         </variant>

I have to extarct all values of and make a Dict which that values, But I can't access the subtrees, how do i do that?

trying to create a dict which will look like this

 dict={
      'modname' => { 
        'Min' : 0.00073,
        'Max': 0.32420,
       }
 }

Upvotes: 3

Views: 5514

Answers (3)

PYPL
PYPL

Reputation: 1849

I have used xml.etree.ElementTree module

dict = {}
tree = ET.parse('file.xml')
root=tree.getroot()
for attribute in root:
        for stats in attribute.iter('statistics'):  #Accessing to child tree of the process 'attribute'
            for sub_att in stats.iter('statistic'): #Iterating trough the attribute items
                    name      =  sub_att.get('name')
                    parameter =  sub_att.get('forparameter')
                    for param_value in sub_att.iter('value'):
                         value = param_value.text   #Collecting the value of the sub_attribute
                         break                      #Speed up the script, skips the <real_value>
            if not dict.has_key(parameter):
                    dict[parameter] = {}
            dict[parameter][name] = value

Output:

dict={
      'modname' : { 
        'Min' : 0.00073,
        'Max': 0.32420,
       }
}

Upvotes: 2

alecxe
alecxe

Reputation: 473853

xmltodict is definitely something you should consider using:

from pprint import pprint
import xmltodict

data = """<?xml version="1.0"?>
<corners>
  <version>1.05</version>
  <process>
    <name>ss649</name>
    <statistics>
      <statistic name="Min" forparameter="modname" isextreme="no" style="tbld">
        <value>0.00073</value>
        <real_value>7.300e-10</real_value>
      </statistic>
      <statistic name="Max" forparameter="modname" isextreme="no" style="tbld">
        <value>0.32420</value>
        <real_value>3.242e-07</real_value>
     </statistic>
    </statistics>
  </process>
</corners>"""

pprint(xmltodict.parse(data))

One line of code and you are good to go.

Hope that works for you.

Upvotes: 2

ebarr
ebarr

Reputation: 7842

You may wish to have a look at this rather nice ActiveState snippet:

http://code.activestate.com/recipes/410469-xml-as-dictionary/

I came across this via the following SO post, which may also be of use:

How to convert an xml string to a dictionary in Python?

Also xmltodict would be a good option:

https://github.com/martinblech/xmltodict

Upvotes: 0

Related Questions