Reputation: 53
I am new to XML parsing and python too .I need to get to the tree subelements and print all of them.
I have an XML file which goes like this. Here is my file- https://gofile.io/?c=OXcdue
My requirement is to read all the queues which has subqueues and their subqueues.
Upvotes: 0
Views: 1731
Reputation: 23815
Below (Using no external library)
import pprint
import xml.etree.ElementTree as ET
xml = '''<allocations>
<queue name="bdpaas_express_q1">
<minResources>12000 mb,2 vcores,1 disks</minResources>
<maxResources>18000 mb,3 vcores,2 disks</maxResources>
<aclSubmitApps> xyz</aclSubmitApps>
<aclAdministerApps> xyz</aclAdministerApps>
<label>allnodes</label>
</queue>
<queue name="dl_priority_q1">
<minResources>8496000 mb,1416 vcores,108 disks</minResources>
<maxResources>12768000 mb,2128 vcores,162 disks</maxResources>
<aclSubmitApps> dla_grp</aclSubmitApps>
<aclAdministerApps> dla_grp</aclAdministerApps>>
<label>fastnodes</label>
</queue>
<queue name="pireporting_q1">
<minResources>6960000 mb,1160 vcores,87 disks</minResources>
<maxResources>10440000 mb,1740 vcores,130 disks</maxResources>
<queue name="atscale_rtam_mr_sq1">
<minResources>6000000 mb,1000 vcores,75 disks</minResources>
<maxResources>9000000 mb,1500 vcores,112 disks</maxResources>
<aclSubmitApps> atscalep</aclSubmitApps>
<aclAdministerApps> atscalep</aclAdministerApps>
<label>allnodes</label>
</queue>
<queue name="atscale_spark_sq1">
<minResources>960000 mb,160 vcores,12 disks</minResources>
<maxResources>1440000 mb,240 vcores,18 disks</maxResources>
<aclSubmitApps> atscalep</aclSubmitApps>
<aclAdministerApps> atscalep</aclAdministerApps>
<label>allnodes</label>
</queue>
</queue>
<queuePlacementPolicy>
<rule create="false" name="specified" />
<rule name="reject" />
</queuePlacementPolicy>
</allocations>
'''
root = ET.fromstring(xml)
queues = root.findall('.//queue')
for queue in queues:
if queue.find('./queue'):
print(ET.tostring(queue, encoding='utf8', method='xml'))
output
<?xml version="1.0" encoding="UTF-8"?>
<queue name="pireporting_q1">
<minResources>6960000 mb,1160 vcores,87 disks</minResources>
<maxResources>10440000 mb,1740 vcores,130 disks</maxResources>
<queue name="atscale_rtam_mr_sq1" />
<queue name="atscale_spark_sq1" />
</queue>
Upvotes: 0
Reputation: 408
You can use the lxml
library to parse any xml content. This library is better than the standard xml
library as it allows you to get the namespace of the xml document if necessary (not needed in your case).
from lxml import etree
tree = etree.parse(path_to_xml_file)
root = tree.getroot()
for children in root.getchildren():
print (children.tag)
for child in children:
print(child.tag, child.text)
Refer to the documentation here for more information on how to access various parts of your xml file and recursively finding all subelements.. This documentation is for the standard xml
library but is also supported in the lxml
library as lxml
is built on top of xml
.
Upvotes: 1