Reputation: 4334
I have a huge XML file with loads of data, I need to create a perl script which will parse the XML and extract only the data that is needed.
Iv been told to use expat, i was wondering if any one had any good tutroial or articles on how to use perl and expat to parse XML.
hope this makes sense im really new to perl.
Upvotes: 0
Views: 354
Reputation: 267
If, as you stated, the XML file is huge and only some selected data is needed, then XML::Reader:RS should do the job: it uses XML::Parser as the underlying parsing module, which in turn uses expat to parse XML.
The following code snippet parses only the information needed from a potentially huge XML file. It only uses a small amount of memory:
use strict;
use warnings;
use XML::Reader::RS;
my $rdr = XML::Reader::RS->new(\*DATA, { mode => 'branches' },
{ root => '/info/line[@cat="A"]', branch => [ '/' ] });
while ($rdr->iterate) {
my ($line) = $rdr->value;
for ($line) {
$_ = '' unless defined $_;
}
print "line = '$line'\n";
}
__DATA__
<info>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="A">Data 0000001</line>
<line cat="A">Data 0000002</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
<line cat="xyz">abc</line>
</info>
(however, XML::Reader::RS is not the fastest, if you want speed as well as memory efficiency, then you should consider using XML::Parser directly)
Upvotes: 0
Reputation: 6524
It would probably be easiest to use expat indirectly through some wrapper such as XML-Twig or XML-Rules. But it would also be possible to parse with a pull parser such as XML::LibXML::Reader from XML-LibXML (which uses libxml instead of expat).
Upvotes: 3