Reputation: 580
i'm just a begginer in perl, and very urgently need to prepare a small script that takes top 3 things from an xml file and puts them in a new one. Here's an example of an xml file:
<article>
{lot of other stuff here}
</article>
<article>
{lot of other stuff here}
</article>
<article>
{lot of other stuff here}
</article>
<article>
{lot of other stuff here}
</article>
What i'd like to do is to get first 3 items along with all the tags in between and put it into another file. Thanks for all the help in advance regards peter
Upvotes: 1
Views: 1621
Reputation: 338188
Never ever use Regex to handle markup languages.
The original version of this answer (see below) used XML::XPath
. Grant McLean said in the comments:
XML::XPath
is an old and unmaintained module.XML::LibXML
is a modern, maintained module with an almost identical API and it's faster too.
so I made a new version that uses XML::LibXML
(thanks, Grant):
use warnings;
use strict;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(location => 'articles.xml');
my $xp = XML::LibXML::XPathContext->new($doc->documentElement);
my $xpath = '/articles/article[position() < 4]';
foreach my $article ( $xp->findnodes($xpath) ) {
# now do something with $article
print $article.": ".$article->getName."\n";
}
For me this prints:
XML::LibXML::Element=SCALAR(0x346ef90): article XML::LibXML::Element=SCALAR(0x346ef30): article XML::LibXML::Element=SCALAR(0x346efa8): article
Links to the relevant documentation:
$doc
will be XML::LibXML::Document
.$xp
is XML::LibXML::XPathContext
.$xp->findnodes()
is XML::LibXML::NodeList
.$article
is XML::LibXML::Element
.Original version of the answer, based on the XML::XPath
package:
use warnings;
use strict;
use XML::XPath;
my $xp = XML::XPath->new(filename => 'articles.xml');
my $xpath = '/articles/article[position() < 4]';
foreach my $article ( $xp->findnodes($xpath)->get_nodelist ) {
# now do something with $article
print $article.": ".$article->getName ."\n";
}
which prints this for me:
XML::XPath::Node::Element=REF(0x38067b8): article XML::XPath::Node::Element=REF(0x38097e8): article XML::XPath::Node::Element=REF(0x3809ae8): article
$xp
is XML::XPath
, obviously.$xp->findnodes()
is XML::XPath::NodeSet
.$article
will be XML::XPath::Node::Element
in this case.Have a look at the docs to find out what you can do with them.
Upvotes: 12
Reputation: 678
Here:
open my $input, "<", "file.xml" or die $!;
open my $output, ">", "truncated-file.xml" or die $!;
my $n_articles = 0;
while (<$input>) {
print $output $_;
if (m:</article>:) {
$n_articles++;
if ($n_articles >= 3) {
last;
}
}
}
close $input or die $!;
close $output or die $!;
You really don't need an XML parser to do such a simple job.
Upvotes: 0