Reputation: 1170
I have a document with this structure:
<?xml version="1.0" encoding="UTF-8"?>
<entries>
<entry>
<term>word_1</term>
<opinion source="data1" polarity="0.10" />
<opinion source="data2" polarity="0.4" />
</entry>
<entry>
<term>word_2</term>
<opinion source="data1" polarity="1.0" />
<opinion source="data2" polarity="-0.16666667" />
<opinion source="data3" polarity="0.004" />
</entry>
<entry>
<term>word_3</term>
<opinion source="data1" polarity="0.6" />
<opinion source="data2" polarity="0.0" />
</entry>
</entries>
I have never worked with xml
before and it proves to be a pain. I want to extract words, their polarity and the source. Ideally, coming from this example, I would have three dictionaries named after source
(I know exactly haw many different sources there are, so manually naming the dictionaries is not a problem), that would hold the words as key
and polarity as value
i.e.,
data1 = {'word1':0.10, 'word2':1.0, 'word3':0.6}
data2 = {'word1':0.4, 'word2':-0.16666667, 'word3':0.0}
data3 = {'word2':0.004}
The problem is, is that I don't really understand how to iterate over this structure. I can iterate over <term>
like so:
import xml.etree.ElementTree as ET
tree = ET.parse('my.xml')
root = tree.getroot()
for term in root.iter('term'):
print term.text
Out:
word_1
word_2
word_3
But I can't get to the source
and polarity
items.
Any help is appreciated. Thanks.
Upvotes: 1
Views: 54
Reputation: 35348
Have a look at this, I think you should be able to follow along how it works.
import xml.etree.ElementTree as ET
data = {}
tree = ET.parse('test.xml')
root = tree.getroot()
for entry in root.iter('entry'):
term = entry.find('term')
for opinion in entry.iter('opinion'):
termDict = data.setdefault(opinion.get('source'), {})
termDict[term.text] = opinion.get('polarity')
for k,v in data.items():
print k, v
Upvotes: 2
Reputation: 1963
You want something like this
import xml.etree.ElementTree
e = xml.etree.ElementTree.parse('test.xml').getroot()
for node in e.iter('entry'): #iterate over each entry node
for child in node:
print child.tag #get the name of the child
print child.attrib['polarity'], child.attrib['source'] #get the source and polarity
but child.attrib
will get you a dict of the attributes of that particular node.
Upvotes: 1