Zlo
Zlo

Reputation: 1170

Iterating over xml document

I have a document with this structure:

<?xml version="1.0" encoding="UTF-8"?>
<entries>
  <entry>
    <term>word_1</term>
    <opinion source="data1" polarity="0.10" />
    <opinion source="data2" polarity="0.4" />
  </entry>
  <entry>
    <term>word_2</term>
    <opinion source="data1" polarity="1.0" />
    <opinion source="data2" polarity="-0.16666667" />
    <opinion source="data3" polarity="0.004" />
 </entry>
 <entry>
    <term>word_3</term>
    <opinion source="data1" polarity="0.6" />
    <opinion source="data2" polarity="0.0" />
 </entry>
</entries>

I have never worked with xml before and it proves to be a pain. I want to extract words, their polarity and the source. Ideally, coming from this example, I would have three dictionaries named after source (I know exactly haw many different sources there are, so manually naming the dictionaries is not a problem), that would hold the words as key and polarity as value i.e.,

data1 = {'word1':0.10, 'word2':1.0, 'word3':0.6}
data2 = {'word1':0.4, 'word2':-0.16666667, 'word3':0.0}
data3 = {'word2':0.004}

The problem is, is that I don't really understand how to iterate over this structure. I can iterate over <term> like so:

import xml.etree.ElementTree as ET
tree = ET.parse('my.xml')
root = tree.getroot()

for term in root.iter('term'):
    print term.text


Out:
word_1
word_2
word_3

But I can't get to the source and polarity items. Any help is appreciated. Thanks.

Upvotes: 1

Views: 54

Answers (2)

DAXaholic
DAXaholic

Reputation: 35348

Have a look at this, I think you should be able to follow along how it works.

import xml.etree.ElementTree as ET

data = {}
tree = ET.parse('test.xml')
root = tree.getroot()

for entry in root.iter('entry'):
    term = entry.find('term')
    for opinion in entry.iter('opinion'):
        termDict = data.setdefault(opinion.get('source'), {})
        termDict[term.text] = opinion.get('polarity')

for k,v in data.items():
    print k, v

Upvotes: 2

AndrewSmiley
AndrewSmiley

Reputation: 1963

You want something like this

import xml.etree.ElementTree
e = xml.etree.ElementTree.parse('test.xml').getroot()
for node in e.iter('entry'): #iterate over each entry node
    for child in node:
            print child.tag #get the name of the child
            print child.attrib['polarity'], child.attrib['source'] #get the source and polarity

but child.attrib will get you a dict of the attributes of that particular node.

Upvotes: 1

Related Questions