Moayyad Yaghi
Moayyad Yaghi

Reputation: 3722

getting specific xml nodes attributes

This might be a newbie question :) but it's irritating me since I'm new to XML. I have the following xml file:

<assetsMain>
  <assetParent type='character' shortName='char'>
    <asset>
      pub
    </asset>
    <asset>
      car
    </asset>
  </assetParent>
  <assetParent type='par' shortName='pr'>
    <asset>
      camera
    </asset>
    <asset>
      rig
    </asset>
  </assetParent>
</assetsMain>

Is it possible to retrieve all <assetParent> nodes and all their attributes and children texts? For example to have the result as the following:

[ [['character','char'],['pub','car']]
  [['par','pr'],['camera','rig']]
]

By the way, I use DOM and Python 2.6

Thanks in advance.

Upvotes: 1

Views: 811

Answers (2)

Mizipzor
Mizipzor

Reputation: 52331

This code gives the output you want:

from xml.dom.minidom import parseString

document = """\
<assetsMain>
  <assetParent type='character' shortName='char'>
    <asset>
      pub
    </asset>
    <asset>
      car
    </asset>
  </assetParent>
  <assetParent type='par' shortName='pr'>
    <asset>
      camera
    </asset>
    <asset>
      rig
    </asset>
  </assetParent>
</assetsMain>
"""

def getNestedList():
    dom = parseString(document)
    li = []
    for assetParent in dom.childNodes[0].getElementsByTagName("assetParent"):
        # read type and shortName
        a = [assetParent.getAttribute("type"), assetParent.getAttribute("shortName")]
        # read content of asset nodes
        b = [asset.childNodes[0].data.strip() for asset in assetParent.getElementsByTagName("asset")]
        # put the lists together in a list and add them to the list (!)
        li.append([a,b])
    return li

if __name__=="__main__":
    print getNestedList()

Note that we can select which child nodes we want to read with getElementsByTagName. The attributes are read with getAttribute on a node. Text content inside a node is read through the property data (the text itself is a child node as well). If you are reading text inside a node, you can check so that it really is text with:

if node.nodeType == node.TEXT_NODE:

Also note that there is no checking or error handling here. Nodes lacking child nodes will raise an IndexError.

Although, a nested list of three levels make me want to suggest you use dictionaries instead.

Output:

[[[u'character', u'char'], [u'pub', u'car']], [[u'par', u'pr'], [u'camera', u'rig']]]

Upvotes: 0

MattH
MattH

Reputation: 38247

An answer using lxml.etree. Xpath would probably be reusable in another capable library:

>>> from lxml import etree
>>> data = """<assetsMain>
... <assetParent type='character' shortName='char'>
... <asset>pub</asset>
... <asset>car</asset>
... </assetParent>
... <assetParent type='par' shortName='pr'>
... <asset>camera</asset>
... <asset>rig</asset>
... </assetParent>
... </assetsMain>
... """
>>> doc = etree.XML(data)
>>> for aP in doc.xpath('//assetParent'):
...   parent = aP.attrib['type']
...   for a in aP.xpath('./asset/text()'):
...     print parent, a.strip()
...
character pub
character car
par camera
par rig

Upvotes: 3

Related Questions