Reputation: 5938
Using xml.etree ( this module please )
How could I parse:
<?xml version="1.0" encoding="UTF-8"?>
<EntityPath="c:\a.zip" Name="a.zip" >
<WorkfileDescription>something</WorkfileDescription>
<Revision EntityPath="c:\a.zip" Name="1.1" Author="me">
<ChangeDescription>Some comentary</ChangeDescription>
<PGROUP Name="A" />
<PGROUP Name="B" />
<PGROUP Name="C" />
<Label Name="SOFTWARE" />
<Label Name="READY" />
</Revision>
<Revision EntityPath="c:\a.zip" Name="1.0" Author="me">
<ChangeDescription>Some comentary</ChangeDescription>
<PGROUP Name="A" />
<Label Name="GAME" />
<Label Name="READY" />
</Revision>
</VersionedFile>
in order to get:
Revision: a.zip
Name: 1.1
Author: me
ChangeDescription: Some comentary
PGROUP: A
PGROUP: B
PGROUP: C
Label: SOFTWARE
Label: READY
Revision: a.zip
Name: 1.0
Author: me
ChangeDescription: Some comentary
PGROUP: A
Label: GAME
Label: READY
Until now with the following code I was able to get only the Revision line, but I'm struggling to parse the other child fields:
from xml.etree import ElementTree
try:
tree = ElementTree.parse(self.xml)
root = tree.getroot()
info_list = []
for child in root:
print(child.tag,child.attrib)
except Exception:
raise
finally:
self.xml = None
Upvotes: 1
Views: 348
Reputation: 5938
Based on alecxe solution, i was able to get it working with:
from xml.etree import ElementTree
try:
tree = ElementTree.parse(self.xml)
info_list = []
for revision in tree.findall('Revision'):
for key, value in revision.attrib.iteritems():
values = dict()
values[key] = value
info_list.append(values)
#print "%s: %s" % (key, value)
for child in revision:
values = dict()
# this is needed to match the change description field.
if child.tag == 'ChangeDescription':
values[child.tag] = child.text
#print "%s: %s" % (child.tag, child.text)
else:
values[child.tag] = child.attrib.get('Name', '')
#print "%s: %s" % (child.tag, child.attrib.get('Name', ''))
info_list.append(values)
print
for i in info_list:
print(i)
except Exception:
raise
finally:
self.xml = None
Upvotes: 0
Reputation: 474191
Find all Revision
tags, print all attributes from element.attrib
, iterate over the Revision
element to get the children and Name
attribute values:
import xml.etree.ElementTree as etree
data = """<?xml version="1.0" encoding="UTF-8"?>
<VersionedFile EntityPath="c:\\a.zip" Name="VfOMP_CRM.zip">
<WorkfileDescription>something</WorkfileDescription>
<Revision EntityPath="c:\\a.zip" Name="1.1" Author="me">
<ChangeDescription>Some comentary</ChangeDescription>
<PGROUP Name="A" />
<PGROUP Name="B" />
<PGROUP Name="C" />
<Label Name="SOFTWARE" />
<Label Name="READY" />
</Revision>
<Revision EntityPath="c:\\a.zip" Name="1.0" Author="me">
<ChangeDescription>Some comentary</ChangeDescription>
<PGROUP Name="A" />
<Label Name="GAME" />
<Label Name="READY" />
</Revision>
</VersionedFile>
"""
tree = etree.fromstring(data)
for revision in tree.findall('Revision'):
for key, value in revision.attrib.iteritems():
print "%s: %s" % (key, value)
for child in revision:
print "%s: %s" % (child.tag, child.attrib.get('Name', ''))
print
prints:
Name: 1.1
EntityPath: c:\a.zip
Author: me
ChangeDescription:
PGROUP: A
PGROUP: B
PGROUP: C
Label: SOFTWARE
Label: READY
Name: 1.0
EntityPath: c:\a.zip
Author: me
ChangeDescription:
PGROUP: A
Label: GAME
Label: READY
You may need to tweak it a bit to have the desired output, but this should give you the basic idea.
Upvotes: 2