python/xml: How to quickly determine root element without parsing the whole file?

Question

I have several large xml files that coming from different resources. They can be easily distinguished by looking at the root tag for each of them. However, parsing them can take some time so I don't want to parse them first and get the root to determine what type of xml is that. Does anybody know a way to do a quick lookup without loading everything into memory? I'm using ElementTree as the tool now. Thanks!

Mike Sokolov · Accepted Answer

You need a streaming parser, not a parser that builds an entire tree up front. Take a look at http://docs.python.org/2/library/pyexpat.html and provide a start element handler that saves away the name of the first start element and then throws an exception, terminating parsing. That way you will only read the beginning of your huge file.

python/xml: How to quickly determine root element without parsing the whole file?

Answers (1)

Related Questions