cmj29607
cmj29607

Reputation: 159

parse a section of an XML file with python

Im new to both python and xml. Have looked at the previous posts on the topic, and I cant figure out how to do exactly what I need to. Although it seems to be simple enough in principle.

<Project>
 <Items>
  <Item>
   <Code>A456B</Code>
   <Database>
    <Data>
     <Id>mountain</Id>
     <Value>12000</Value>
    </Data>
    <Data>
     <Id>UTEM</Id>
     <Value>53.2</Value>
    </Data>
   </Database>
  </Item>
  <Item>
   <Code>A786C</Code>
   <Database>
    <Data>
     <Id>mountain</Id>
     <Value>5000</Value>
    </Data>
    <Data>
     <Id>UTEM</Id>
     <Value></Value>
    </Data>
   </Database>
  </Item>
 </Items>
</Project> 

All I want to do is extract all of the Codes, Values and ID's, which is no problem.

import xml.etree.cElementTree as ET

name = 'example tree.xml'
tree = ET.parse(name)
root = tree.getroot()
codes=[]
ids=[]
val=[]
for db in root.iter('Code'):
    codes.append(db.text)
for ID in root.iter('Id'):
    ids.append(ID.text)
for VALUE in root.iter('Value'):
    val.append(VALUE.text)
print codes
print ids
print val

['A456B', 'A786C']
['mountain', 'UTEM', 'mountain', 'UTEM']
['12000', '53.2', '5000', None]

I want to know which Ids and Values go with which Code. Something like a dictionary of dictionaries maybe OR perhaps a list of DataFrames with the row index being the Id, and the column header being Code.

for example

A456B = {mountain:12000, UTEM:53.2}
A786C = {mountain:5000, UTEM: None}

Eventually I want to use the Values to feed an equation.

Note that the real xml file might not contain the same number of Ids and Values in each Code. Also, Id and Value might be different from one Code section to another.

Sorry if this question is elementary, or unclear...I've only been doing python for a month :/

Upvotes: 2

Views: 165

Answers (2)

Boa
Boa

Reputation: 2677

BeautifulSoup is a very useful module for parsing HTML and XML.

from bs4 import BeautifulSoup
import os

# read the file into a BeautifulSoup object
soup = BeautifulSoup(open(os.getcwd() + "\\input.txt"))

results = {}

# parse the data, and put it into a dict, where the values are dicts
for item in soup.findAll('item'):
    # assemble dicts on the fly using a dict comprehension:
    # http://stackoverflow.com/a/14507637/4400277
    results[item.code.text] = {data.id.text:data.value.text for data in item.findAll('data')}

>>> results
{u'A786C': {u'mountain': u'5000', u'UTEM': u''}, 
 u'A456B': {u'mountain': u'12000', u'UTEM': u'53.2'}

Upvotes: 1

Kiran Tomlinson
Kiran Tomlinson

Reputation: 313

This might be what you want:

import xml.etree.cElementTree as ET

name = 'test.xml'
tree = ET.parse(name)
root = tree.getroot()
codes={}

for item in root.iter('Item'):
    code = item.find('Code').text
    codes[code] = {}

    for datum in item.iter('Data'):
        if datum.find('Value') is not None:
            value = datum.find('Value').text
        else:
            value = None
        if datum.find('Id') is not None:
            id = datum.find('Id').text
            codes[code][id] = value

print codes

This produces: {'A456B' : {'mountain' : '12000', 'UTEM' : '53.2'}, 'A786C' : {'mountain' : '5000', 'UTEM' : None}}

This iterates over all Item tags, and for each one, creates a dict key pointing to a dict of id/value pairs. An id/data pair is only created if the Id tag is not empty.

Upvotes: 0

Related Questions