Kratos85
Kratos85

Reputation: 193

How to parse XML file in chunks

I have a very large XML file with 40,000 tag elements. When i am using element tree to parse this file it's giving errors due to memory. So is there any module in python that can read the xml file in data chunks without loading the entire xml into memory?And How i can implement that module?

Upvotes: 7

Views: 5232

Answers (2)

Michael Dillon
Michael Dillon

Reputation: 32392

This is a problem that people usually solve using sax.

If your huge file is basically a bunch of XML documents aggregated inside and overall XML envelope, then I would suggest using sax (or plain string parsing) to break it up into a series of individual documents that you can then process using lxml.etree.

Upvotes: 2

Zach Kelling
Zach Kelling

Reputation: 53879

Probably the best library for working with XML in Python is lxml, in this case you should be interested in iterparse/iterwalk.

Upvotes: 8

Related Questions