ElementTree.parse runs out of memory

Question

I have the following simple Python script used as a post-export test to verify the exported XML is valid.

from xml.etree import ElementTree

try:
    ElementTree.parse(args[0])
except ElementTree.ParseError as e:
    raise Exception('%s does not contain valid XML.' % args[0])

The VM which the script runs out however appears to run out of memory with the latest export file, which is about 88Mb in size.

Running the same script against the same file on my local workstation however parses the file in about 30 seconds without error.

The XML itself is not particularly deep, I think the maximum depth is about 4 levels. However the list is fairly long, at 38,570 items. As a result, I'm thinking there's probably a much more efficient way of parsing this, since I have no desire to store or handle the result of the parsing, I simply want to make sure the XML is valid.

Emma Leis · Accepted Answer

I don't know Python, but I suggest checking what type of parser ElementTree.parse uses.

If it's a DOM parser, try to find a SAX parser and use that instead. SAX parsers are more efficient as they don't store the entire DOM tree.

ElementTree.parse runs out of memory

Answers (1)

Related Questions