python cElementTree uses too much memory

Question

I have the following code segment

import xml.etree.cElementTree as et

fstring = open(filename).read()
tree = et.fromstring(fstring)

for el in tree.findall('tag'):
    do stuff

However, fstring is HUGE (~80mbs of data), and I am hiting "Out of memory" error when I try to convert the string to a tree. Is there a way to get around that, perhaps some kind of lazy evaluation of the tree?

Thanks!

EDIT:

I tried using iterparse, and it still gives me MemoryError on the iterparse call. Is there a way to possibly split up the file into multiple chunks and process them one by one?

NPE · Accepted Answer

Take a look at iterparse:

For example, to parse large files, you can get rid of elements as soon as you’ve processed them:
for event, elem in iterparse(source):
    if elem.tag == "record":
        ... process record elements ...
        elem.clear()

python cElementTree uses too much memory

Answers (1)

Related Questions