user9990443
user9990443

Reputation:

python xml.dom - access keys that are child of another key

I am trying to access data from a xml file made as following

<datafile>
    <header>
        <name>catalogue</name>
        <description>the description</description>
    </header>
    <item name="jack">
        <description>the headhunter</description>
        <year>1981</year>
    </item>
    <item name="joe">
        <description>the butler</description>
        <year>1995</year>
    </item>
    <item name="david">
        <description>guest</description>
        <year>2000</year>
    </item>
</datafile>

I would like to parse all the name tags, and when that match, I would like to retrieve the description. So far I can retrieve all the item, and I can print out the name field, but I can't find a way to access the sub-tag description and year.

from xml.dom import minidom

xmldoc = minidom.parse("myfile.xml")
# This does retrieve all the item elements 
itemlist = xmldoc.getElementsByTagName('item')
print(len(itemlist))
# This does print the name of the first element
print(itemlist[0].attributes['name'].value)
# This give me a key error, although I can see that the child element 1 of itemlist is the description
print(itemlist[1].attributes['description'].value)

I am not sure how to access the sub-elements, since they are children of the item element; do I need to create another itemlist from the item element list to retrieve the description key and access its value? Or am I totally off?

Upvotes: 0

Views: 648

Answers (2)

balderman
balderman

Reputation: 23815

One line - using ElementTree

import xml.etree.ElementTree as ET

xml = '''
<datafile>
    <header>
        <name>catalogue</name>
        <description>the description</description>
    </header>
    <item name="jack">
        <description>the headhunter</description>
        <year>1981</year>
    </item>
    <item name="joe">
        <description>the butler</description>
        <year>1995</year>
    </item>
    <item name="david">
        <description>guest</description>
        <year>2000</year>
    </item>
</datafile>'''

root = ET.fromstring(xml)
data = [(i.attrib['name'],i.find('./description').text) for i in root.findall('.//item')]
print(data)

output

[('jack', 'the headhunter'), ('joe', 'the butler'), ('david', 'guest')]

Upvotes: 1

Roy2012
Roy2012

Reputation: 12523

Here's a way to extract the data. Not sure it's the most elegant one, but it works:

for item in xmldoc.getElementsByTagName("item"):
    name = item.attributes.getNamedItem("name").value
    print(f"name is {name}") 
    desc = item.getElementsByTagName("description")[0].childNodes[0].data
    print(f"description is {desc}")

The output is:

name is jack
description is the headhunter
name is joe
description is the butler
name is david
description is guest

Note that the documentation of minidom is, well, kind of lacking. But, it (mostly) implements the DOM standard - see documentation here.

Upvotes: 0

Related Questions