Reputation: 1037
I have an XML file, which I wanted to convert to a dictionary. I have tried to write the following code but the output is not as expected. I have the following XML file named core-site.xml:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdfs/tmp</value>
<description>Temporary Directory.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.XXX.X.XXX:XXXX</value>
<description>Use HDFS as file storage engine</description>
</property>
</configuration>
The code that I wrote is:
import xml.etree.cElementTree
import xml.etree.ElementTree as ET
import warnings
warnings.filterwarnings("ignore")
class XmlListConfig(list):
def __init__(self, aList):
for element in aList:
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
self.append(XmlDictConfig(element))
# treat like list
elif element[0].tag == element[1].tag:
self.append(XmlListConfig(element))
elif element.text:
text = element.text.strip()
if text:
self.append(text)
class XmlDictConfig(dict):
def __init__(self, parent_element):
if parent_element.items():
self.update(dict(parent_element.items()))
for element in parent_element:
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
aDict = XmlDictConfig(element)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
aDict = {element[0].tag: XmlListConfig(element)}
# if the tag has attributes, add those to the dict
if element.items():
aDict.update(dict(element.items()))
self.update({element.tag: aDict})
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
self.update({element.tag: dict(element.items())})
# finally, if there are no child tags and no attributes, extract
# the text
else:
self.update({element.tag: element.text})
tree = ET.parse('core-site.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)
print xmldict
This is the output that I am getting:
{
'property':
{
'name': 'fs.defaultFS',
'value': 'hdfs://192.X.X.X:XXXX',
'description': 'Use HDFS as file storage engine'
}
}
Why isn't the first property tag being shown? It only shows the data in the last property tag.
Upvotes: 0
Views: 448
Reputation: 3725
Since you are using a dict
, the second element with the same key property replaces the first element previously recorded in the dict
.
You have to use a different data structure, a list
of dict
for instance.
Upvotes: 2