Reputation: 13
There is a file called core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/centos/hadoop_tmp/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://test:9000</value>
</property>
</configuration>
How could I get a dict in python like this:
{'hadoop.tmp.dir': 'file:/home/centos/hadoop/tmp', 'fs.defaultFS': 'hdfs://test:9000'}
Upvotes: 0
Views: 1143
Reputation: 2891
The question already has an accepted answer, but since I commented on it, I wanted to give an example of use of the one of the modules I suggested.
xml = '''<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/centos/hadoop_tmp/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://test:9000</value>
</property>
</configuration>'''
import xmltodict
# Load the xml string into a test object
test = xmltodict.parse(xml)
# Instantiate a temporary dictionary where we will store the parsed data
temp_dict = {}
# Time to parse the resulting structure
for name in test:
# Check that we have the needed 'property' key before doing any processing on the leaf
if 'property' in test[name].keys():
# For each property leaf
for property in test[name]['property']:
# If the leaf has the stuff you need to save, print it
if 'name' in property.keys():
print('Found name', property['name'])
if 'value' in property.keys():
print('With value', property['value'])
# And then save it to the temporary dictionary in the form you need
# Do note that if you have duplicate "name" strings, only the last "value" will be saved
temp_dict.update({property['name']: property['value']})
print(temp_dict)
And here's the output
Found name hadoop.tmp.dir
With value file:/home/centos/hadoop_tmp/tmp
Found name fs.defaultFS
With value hdfs://test:9000
{'hadoop.tmp.dir':'file:/home/centos/hadoop_tmp/tmp', 'fs.defaultFS':'hdfs://test:9000'}
Upvotes: 0
Reputation: 1259
You should use the ElementTree python library which can be found here: https://docs.python.org/2/library/xml.etree.elementtree.html
Firstly, you will need to pass the .xml file into the ElementTree library
import xml.etree.ElementTree as ET
tree = ET.parse('core-site.xml')
root = tree.getroot()
Once done, you can then start using the root
object to parse the XML document
for property in root.findall('property'):
Within this loop, you can start extracting names and values from your properties
for entry in root.findall('property'):
name = entry.find('name').text
value = entry.find('value').text
print(name)
print(value)
You want to add this to a dictionary, which should be as simple as
configuration = dict()
for entry in root.findall('property'):
name = entry.find('name').text
value = entry.find('value').text
configuration[name] = value
Then you should have a dictionary with all your XML configurations inside of it
import xml.etree.ElementTree as ET
tree = ET.parse('core-site.xml')
root = tree.getroot()
configuration = dict()
for entry in root.findall('property'):
name = entry.find('name').text
value = entry.find('value').text
configuration[name] = value
print(configuration)
Upvotes: 2