Mike
Mike

Reputation: 7831

Python XML to dictionary to iterate over items

I have the following XML example

<?xml version="1.0"?>
<test>
    <items>
        <item>item 1</item>
        <item>item 2</item>
    </items>
</test>

I need to iterate over each tag in a for loop in python. If tried many things but I just can't get it..

thanks for the help

Upvotes: 5

Views: 6491

Answers (4)

Esteban K&#252;ber
Esteban K&#252;ber

Reputation: 36832

I personally use xml.etree.cElementTree, as I've found it works really well, it's fast, easy to use, and works well with big (>2GB) files.

import xml.etree.cElementTree as etree

with open(xml_file_path) as xml_file:
    tree = etree.iterparse(xml_file)
    for items in tree:
        for item in items:
            print item.text

In the interactive console

>>> x="""<?xml version="1.0"?>
<test>
    <items>
        <item>item 1</item>
        <item>item 2</item>
    </items>
</test>"""
>>> x
'<?xml version="1.0"?>\n<test>\n    <items>\n        <item>item 1</item>\n        <item>item 2</item>\n    </items>\n</test>'
>>> import xml.etree.cElementTree as etree
>>> tree = etree.fromstring(x)
>>> tree
<Element 'test' at 0xb63ad248>
>>> for i in tree:
        for j in i:
            print j


<Element 'item' at 0xb63ad2f0>
<Element 'item' at 0xb63ad338>
>>> for i in tree:
        for j in i:
            j.text

'item 1'
'item 2'
>>>

Upvotes: 7

pajton
pajton

Reputation: 16226

Try xml parser from xml.sax package in standard library.

from xml.sax import parse
from xml.sax.handler import ContentHandler
from sys import argv

class Handler(ContentHandler):
    def startElementNS(self, name, qname, attrs):
        self.startElement(name, attrs)

    def endElementNs(self, name, qname):
        self.endElement(name, attrs)

    def startElement(self, name, qname, attrs):
        ... do whatever you like on tag start...

    def characters(self, content):
        ... on tag content ...

    def endElement(self, name):
        ... on tag closing ...

if __name__ == "__main__":
    parse(argv[1], Handler())

Here I assumed argv[1] is a path to the file you'd like to parse. (first argument to parse() function is filename or stream). It is easy to convert it to for loop: just grab all the information you need in the methods above and push them into some list or stack. Iterate over it once you have finished parsing.

Upvotes: 1

flaxeater
flaxeater

Reputation: 690

You would probably like to use something like ElementTree This is a well renowned library, I have not personally used it but I always hear good things.

Also as of python 2.5 it's part of the standard library

Upvotes: 0

YOU
YOU

Reputation: 123821

import xml.dom.minidom as md

x='''<?xml version="1.0"?>
<test>
    <items>
        <item>item 1</item>
        <item>item 2</item>
    </items>
</test>
'''

xml=md.parseString(x)

items=xml.getElementsByTagName("item")
# [<DOM Element: item at 0xc16e40>, <DOM Element: item at 0xc16ee0>]

since items is DOM Element Array, you could loop with for

Upvotes: 1

Related Questions