Shang Wang
Shang Wang

Reputation: 25539

python/xml: How to quickly determine root element without parsing the whole file?

I have several large xml files that coming from different resources. They can be easily distinguished by looking at the root tag for each of them. However, parsing them can take some time so I don't want to parse them first and get the root to determine what type of xml is that. Does anybody know a way to do a quick lookup without loading everything into memory? I'm using ElementTree as the tool now. Thanks!

Upvotes: 0

Views: 227

Answers (1)

Mike Sokolov
Mike Sokolov

Reputation: 7044

You need a streaming parser, not a parser that builds an entire tree up front. Take a look at http://docs.python.org/2/library/pyexpat.html and provide a start element handler that saves away the name of the first start element and then throws an exception, terminating parsing. That way you will only read the beginning of your huge file.

Upvotes: 1

Related Questions