Reputation: 2989
I have an XML file of(30GB) which contains 2 classes of data, The data of class 1 has corresponding
<id="11" class="1" bestmatchingid="50" Body="abc"> </id>
.
.
.
<id="9999890" class="2" MatchingClass1Id="11" Body="xyz"></id>
Now the task is to extract class1's body and corresponding class 2's body where e.g.
class1's id(11)== MatchingClass1Id of class2(which is 9999890)
I am accomplishing the same by using string comparison's in Python...is there a more efficient way in Python to accomplish the same considering my file size is 30 GB
Upvotes: 0
Views: 161
Reputation: 2989
lxml works good for your purpose. Also since you are a begineer..so for understanding the basic refer to the tutorial:
http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html
All iterparse method is an efficient method to solve your problem
Upvotes: -1
Reputation: 363567
Use LXML's iterparse
function. See the IBM DeveloperWorks article about it for how to use it on very large files.
Upvotes: 4