Reputation: 159
Im new to both python and xml. Have looked at the previous posts on the topic, and I cant figure out how to do exactly what I need to. Although it seems to be simple enough in principle.
<Project>
<Items>
<Item>
<Code>A456B</Code>
<Database>
<Data>
<Id>mountain</Id>
<Value>12000</Value>
</Data>
<Data>
<Id>UTEM</Id>
<Value>53.2</Value>
</Data>
</Database>
</Item>
<Item>
<Code>A786C</Code>
<Database>
<Data>
<Id>mountain</Id>
<Value>5000</Value>
</Data>
<Data>
<Id>UTEM</Id>
<Value></Value>
</Data>
</Database>
</Item>
</Items>
</Project>
All I want to do is extract all of the Codes, Values and ID's, which is no problem.
import xml.etree.cElementTree as ET
name = 'example tree.xml'
tree = ET.parse(name)
root = tree.getroot()
codes=[]
ids=[]
val=[]
for db in root.iter('Code'):
codes.append(db.text)
for ID in root.iter('Id'):
ids.append(ID.text)
for VALUE in root.iter('Value'):
val.append(VALUE.text)
print codes
print ids
print val
['A456B', 'A786C']
['mountain', 'UTEM', 'mountain', 'UTEM']
['12000', '53.2', '5000', None]
I want to know which Ids and Values go with which Code. Something like a dictionary of dictionaries maybe OR perhaps a list of DataFrames with the row index being the Id, and the column header being Code.
for example
A456B = {mountain:12000, UTEM:53.2}
A786C = {mountain:5000, UTEM: None}
Eventually I want to use the Values to feed an equation.
Note that the real xml file might not contain the same number of Ids and Values in each Code. Also, Id and Value might be different from one Code section to another.
Sorry if this question is elementary, or unclear...I've only been doing python for a month :/
Upvotes: 2
Views: 165
Reputation: 2677
BeautifulSoup is a very useful module for parsing HTML and XML.
from bs4 import BeautifulSoup
import os
# read the file into a BeautifulSoup object
soup = BeautifulSoup(open(os.getcwd() + "\\input.txt"))
results = {}
# parse the data, and put it into a dict, where the values are dicts
for item in soup.findAll('item'):
# assemble dicts on the fly using a dict comprehension:
# http://stackoverflow.com/a/14507637/4400277
results[item.code.text] = {data.id.text:data.value.text for data in item.findAll('data')}
>>> results
{u'A786C': {u'mountain': u'5000', u'UTEM': u''},
u'A456B': {u'mountain': u'12000', u'UTEM': u'53.2'}
Upvotes: 1
Reputation: 313
This might be what you want:
import xml.etree.cElementTree as ET
name = 'test.xml'
tree = ET.parse(name)
root = tree.getroot()
codes={}
for item in root.iter('Item'):
code = item.find('Code').text
codes[code] = {}
for datum in item.iter('Data'):
if datum.find('Value') is not None:
value = datum.find('Value').text
else:
value = None
if datum.find('Id') is not None:
id = datum.find('Id').text
codes[code][id] = value
print codes
This produces:
{'A456B' : {'mountain' : '12000', 'UTEM' : '53.2'}, 'A786C' : {'mountain' : '5000', 'UTEM' : None}}
This iterates over all Item tags, and for each one, creates a dict key pointing to a dict of id/value pairs. An id/data pair is only created if the Id tag is not empty.
Upvotes: 0