Reputation:
I am trying to access data from a xml file made as following
<datafile>
<header>
<name>catalogue</name>
<description>the description</description>
</header>
<item name="jack">
<description>the headhunter</description>
<year>1981</year>
</item>
<item name="joe">
<description>the butler</description>
<year>1995</year>
</item>
<item name="david">
<description>guest</description>
<year>2000</year>
</item>
</datafile>
I would like to parse all the name
tags, and when that match, I would like to retrieve the description.
So far I can retrieve all the item
, and I can print out the name field, but I can't find a way to access the sub-tag description
and year
.
from xml.dom import minidom
xmldoc = minidom.parse("myfile.xml")
# This does retrieve all the item elements
itemlist = xmldoc.getElementsByTagName('item')
print(len(itemlist))
# This does print the name of the first element
print(itemlist[0].attributes['name'].value)
# This give me a key error, although I can see that the child element 1 of itemlist is the description
print(itemlist[1].attributes['description'].value)
I am not sure how to access the sub-elements, since they are children of the item element; do I need to create another itemlist from the item element list to retrieve the description key and access its value? Or am I totally off?
Upvotes: 0
Views: 648
Reputation: 23815
One line - using ElementTree
import xml.etree.ElementTree as ET
xml = '''
<datafile>
<header>
<name>catalogue</name>
<description>the description</description>
</header>
<item name="jack">
<description>the headhunter</description>
<year>1981</year>
</item>
<item name="joe">
<description>the butler</description>
<year>1995</year>
</item>
<item name="david">
<description>guest</description>
<year>2000</year>
</item>
</datafile>'''
root = ET.fromstring(xml)
data = [(i.attrib['name'],i.find('./description').text) for i in root.findall('.//item')]
print(data)
output
[('jack', 'the headhunter'), ('joe', 'the butler'), ('david', 'guest')]
Upvotes: 1
Reputation: 12523
Here's a way to extract the data. Not sure it's the most elegant one, but it works:
for item in xmldoc.getElementsByTagName("item"):
name = item.attributes.getNamedItem("name").value
print(f"name is {name}")
desc = item.getElementsByTagName("description")[0].childNodes[0].data
print(f"description is {desc}")
The output is:
name is jack
description is the headhunter
name is joe
description is the butler
name is david
description is guest
Note that the documentation of minidom is, well, kind of lacking. But, it (mostly) implements the DOM standard - see documentation here.
Upvotes: 0