Jannat Arora
Jannat Arora

Reputation: 2989

Efficient way to perform xml parsing in python

I have an XML file of(30GB) which contains 2 classes of data, The data of class 1 has corresponding

<id="11" class="1" bestmatchingid="50" Body="abc"> </id>
.
.
.
<id="9999890" class="2" MatchingClass1Id="11" Body="xyz"></id>

Now the task is to extract class1's body and corresponding class 2's body where e.g.

class1's id(11)== MatchingClass1Id of class2(which is 9999890)

I am accomplishing the same by using string comparison's in Python...is there a more efficient way in Python to accomplish the same considering my file size is 30 GB

Upvotes: 0

Views: 161

Answers (2)

Jannat Arora
Jannat Arora

Reputation: 2989

lxml works good for your purpose. Also since you are a begineer..so for understanding the basic refer to the tutorial:

http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html

All iterparse method is an efficient method to solve your problem

Upvotes: -1

Fred Foo
Fred Foo

Reputation: 363567

Use LXML's iterparse function. See the IBM DeveloperWorks article about it for how to use it on very large files.

Upvotes: 4

Related Questions