Reputation: 3722
This might be a newbie question :) but it's irritating me since I'm new to XML. I have the following xml file:
<assetsMain>
<assetParent type='character' shortName='char'>
<asset>
pub
</asset>
<asset>
car
</asset>
</assetParent>
<assetParent type='par' shortName='pr'>
<asset>
camera
</asset>
<asset>
rig
</asset>
</assetParent>
</assetsMain>
Is it possible to retrieve all <assetParent>
nodes and all their attributes and children texts? For example to have the result as the following:
[ [['character','char'],['pub','car']]
[['par','pr'],['camera','rig']]
]
By the way, I use DOM and Python 2.6
Thanks in advance.
Upvotes: 1
Views: 811
Reputation: 52331
This code gives the output you want:
from xml.dom.minidom import parseString
document = """\
<assetsMain>
<assetParent type='character' shortName='char'>
<asset>
pub
</asset>
<asset>
car
</asset>
</assetParent>
<assetParent type='par' shortName='pr'>
<asset>
camera
</asset>
<asset>
rig
</asset>
</assetParent>
</assetsMain>
"""
def getNestedList():
dom = parseString(document)
li = []
for assetParent in dom.childNodes[0].getElementsByTagName("assetParent"):
# read type and shortName
a = [assetParent.getAttribute("type"), assetParent.getAttribute("shortName")]
# read content of asset nodes
b = [asset.childNodes[0].data.strip() for asset in assetParent.getElementsByTagName("asset")]
# put the lists together in a list and add them to the list (!)
li.append([a,b])
return li
if __name__=="__main__":
print getNestedList()
Note that we can select which child nodes we want to read with getElementsByTagName
. The attributes are read with getAttribute
on a node. Text content inside a node is read through the property data
(the text itself is a child node as well). If you are reading text inside a node, you can check so that it really is text with:
if node.nodeType == node.TEXT_NODE:
Also note that there is no checking or error handling here. Nodes lacking child nodes will raise an IndexError
.
Although, a nested list of three levels make me want to suggest you use dictionaries instead.
Output:
[[[u'character', u'char'], [u'pub', u'car']], [[u'par', u'pr'], [u'camera', u'rig']]]
Upvotes: 0
Reputation: 38247
An answer using lxml.etree. Xpath would probably be reusable in another capable library:
>>> from lxml import etree
>>> data = """<assetsMain>
... <assetParent type='character' shortName='char'>
... <asset>pub</asset>
... <asset>car</asset>
... </assetParent>
... <assetParent type='par' shortName='pr'>
... <asset>camera</asset>
... <asset>rig</asset>
... </assetParent>
... </assetsMain>
... """
>>> doc = etree.XML(data)
>>> for aP in doc.xpath('//assetParent'):
... parent = aP.attrib['type']
... for a in aP.xpath('./asset/text()'):
... print parent, a.strip()
...
character pub
character car
par camera
par rig
Upvotes: 3