Jin
Jin

Reputation: 6145

python cElementTree uses too much memory

I have the following code segment

import xml.etree.cElementTree as et

fstring = open(filename).read()
tree = et.fromstring(fstring)

for el in tree.findall('tag'):
    do stuff

However, fstring is HUGE (~80mbs of data), and I am hiting "Out of memory" error when I try to convert the string to a tree. Is there a way to get around that, perhaps some kind of lazy evaluation of the tree?

Thanks!

EDIT:

I tried using iterparse, and it still gives me MemoryError on the iterparse call. Is there a way to possibly split up the file into multiple chunks and process them one by one?

Upvotes: 0

Views: 382

Answers (1)

NPE
NPE

Reputation: 500327

Take a look at iterparse:

For example, to parse large files, you can get rid of elements as soon as you’ve processed them:

for event, elem in iterparse(source):
    if elem.tag == "record":
        ... process record elements ...
        elem.clear()

Upvotes: 2

Related Questions