Reputation: 7831
I have the following XML example
<?xml version="1.0"?>
<test>
<items>
<item>item 1</item>
<item>item 2</item>
</items>
</test>
I need to iterate over each tag in a for loop in python. If tried many things but I just can't get it..
thanks for the help
Upvotes: 5
Views: 6491
Reputation: 36832
I personally use xml.etree.cElementTree
, as I've found it works really well, it's fast, easy to use, and works well with big (>2GB) files.
import xml.etree.cElementTree as etree
with open(xml_file_path) as xml_file:
tree = etree.iterparse(xml_file)
for items in tree:
for item in items:
print item.text
In the interactive console
>>> x="""<?xml version="1.0"?>
<test>
<items>
<item>item 1</item>
<item>item 2</item>
</items>
</test>"""
>>> x
'<?xml version="1.0"?>\n<test>\n <items>\n <item>item 1</item>\n <item>item 2</item>\n </items>\n</test>'
>>> import xml.etree.cElementTree as etree
>>> tree = etree.fromstring(x)
>>> tree
<Element 'test' at 0xb63ad248>
>>> for i in tree:
for j in i:
print j
<Element 'item' at 0xb63ad2f0>
<Element 'item' at 0xb63ad338>
>>> for i in tree:
for j in i:
j.text
'item 1'
'item 2'
>>>
Upvotes: 7
Reputation: 16226
Try xml parser from xml.sax
package in standard library.
from xml.sax import parse from xml.sax.handler import ContentHandler from sys import argv class Handler(ContentHandler): def startElementNS(self, name, qname, attrs): self.startElement(name, attrs) def endElementNs(self, name, qname): self.endElement(name, attrs) def startElement(self, name, qname, attrs): ... do whatever you like on tag start... def characters(self, content): ... on tag content ... def endElement(self, name): ... on tag closing ... if __name__ == "__main__": parse(argv[1], Handler())
Here I assumed argv[1] is a path to the file you'd like to parse. (first argument to parse() function is filename or stream). It is easy to convert it to for loop: just grab all the information you need in the methods above and push them into some list or stack. Iterate over it once you have finished parsing.
Upvotes: 1
Reputation: 690
You would probably like to use something like ElementTree This is a well renowned library, I have not personally used it but I always hear good things.
Also as of python 2.5 it's part of the standard library
Upvotes: 0
Reputation: 123821
import xml.dom.minidom as md
x='''<?xml version="1.0"?>
<test>
<items>
<item>item 1</item>
<item>item 2</item>
</items>
</test>
'''
xml=md.parseString(x)
items=xml.getElementsByTagName("item")
# [<DOM Element: item at 0xc16e40>, <DOM Element: item at 0xc16ee0>]
since items
is DOM Element Array, you could loop with for
Upvotes: 1