lxml and fast_iter eating all the memory

Question

I want to parse a 1.6 GB XML file with Python (2.7.2) using lxml (3.2.0) on OS X (10.8.2). Because I had already read about potential issues with memory consumption, I already use fast_iter in it, but after the main loop, it eats up about 8 GB RAM, even it doesn't keep any data from the actual XML file.

from lxml import etree

def fast_iter(context, func, *args, **kwargs):
    # http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
    # Author: Liza Daly
    for event, elem in context:
        func(elem, *args, **kwargs)
        elem.clear()
        while elem.getprevious() is not None:
            del elem.getparent()[0]
    del context

def process_element(elem):
    pass

context = etree.iterparse("sachsen-latest.osm", tag="node", events=("end", ))
fast_iter(context, process_element)

I don't get, why there is such a massive leakage, because the element and the whole context is being deleted in fast_iter() and at the moment I don't even process the XML data.

Any ideas?

lxml and fast_iter eating all the memory

Answers (1)

Related Questions