Reputation: 33
I am new in perl and exploring it
I have a .xml file and I am looking to get few sections of it.
Each section starts and ends with <field>
. and I want to get content in between them
<field>
<address>20</address>
<startat>0</startat>
<size>8</size>
<field>
<field>
<address>21</address>
<startat>0</startat>
<size>8</size>
<field>
and output I am looking as below
<address>20</address>
<startat>0</startat>
<size>8</size>
<address>21</address>
<startat>0</startat>
<size>8</size>
How would I go about extracting that part of the file?
Any help is much appreciated.
Upvotes: 0
Views: 71
Reputation: 2154
You may go about this problem by going through the text, but it is always safer to use an XML parser. There are a number of excellent Perl XML libraries available in CPAN. One that I like is XML::LibXML
(see here) which is an interface to libxml2
. It provides lots of possibilities. Using the functionality of XML::LibXML::XPathContext
we could do:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $parser = XML::LibXML->new( recover => 1 );
my $xp = $parser->parse_string(<<'EndXML');
<document>
<field>
<address>20</address>
<startat>0</startat>
<size>8</size>
</field>
<field>
<address>21</address>
<startat>0</startat>
<size>8</size>
</field>
</document>
EndXML
if( $@ ) {
die "Cannot parse XML\n";
}
foreach my $c ( $xp->findnodes('//field') ) {
print $c->findnodes('.'), "\n";
}
The output:
<field>
<address>20</address>
<startat>0</startat>
<size>8</size>
</field>
<field>
<address>21</address>
<startat>0</startat>
<size>8</size>
</field>
A few comments:
recover => 1
may be useful to parse broken XML files. It will not fix all problems, but may be able to fix some of them. Leave empty if you want no fixing. Use recover => 2
to make the fixing silent.findnodes
which takes an XPath expression. In this case //field
will find any <field>
tags. Then findnodes('.')
will get the whole content of the "field".Upvotes: 2