Reputation: 3560
I've used XML::Simple for over a decade and it's done everything I need it to, and I barely ever touch Perl any more. Though right now I need to parse an XML string to simply: get all of the elements that are children of the root, and for each get their element type, attributes, and content (I don't care if there is any nested elements, just reading the content as a string is perfect). I can do all that with XML::Simple EXCEPT I also need to keep the order, which Simple can't do when there are multiple element types.
I just installed Twig and it looks very overwhelming for something I hoped would be a quick script. It's unlikely that I'll ever use Twig again after this, is this something that Twig can do easily?
Upvotes: 2
Views: 3852
Reputation: 53478
At a simple level - XML::Twig
- traversing children:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'myxml.xml' );
foreach my $element ( $twig -> root -> children ) {
print $element -> text; #element content.
}
Extracting element attributes is either done with:
$element -> att('attributename');
Or you can fetch a hash ref with atts
:
my $attributes = $element -> atts();
foreach my $key ( keys %$attributes ) {
print "$key => ", $attributes -> {$key}, "\n";
}
The thing I particularly like though, is that for XML where you've a long list of similar elements, where you're trying to process - you can define a handler - that's called each time the parser encounters and is handed that subset of XML.
sub process_book {
my ( $twig, $book ) = @_;
print $book -> first_child ('title');
$twig -> purge; #discard anything we've already seen.
}
my $twig = XML::Twig -> new ( twig_handlers => { 'book' => \&process_book } );
$twig -> parsefile ( 'books.xml' );
Sample XML:
<XML>
<BOOK>
<title>Elements of style</title>
<author>Strunk and White</author>
</BOOK>
</XML>
Upvotes: 4
Reputation: 16136
The code below should give you enough information to get started.
A few notes:
parsefile
instead of parse
'level(1)'
instead of '/root/*'
process_elt
), passing $atts
and $strings
is the clean way to do this, if you want $atts
and $strings
to be global variables you can just write '/root/*' => \&process_elt
and the handler will be called with the twig and the element as parameters$t->purge
bit is there to free the memory used by the element you just processed, it is useful if the file is too big to fit in memory, otherwise you don't need to use itDDP
is Data::Printer
, it's only there to check the output, you can use any other way to do this (Data::Dumper
, YAML
, prints...) Here is the code:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $atts = []; # attributes
my $strings = []; # text content
XML::Twig->new( twig_handlers =>
{ '/root/*' => sub { process_elt( @_, $strings, $atts); } })
->parse( \*DATA);
use DDP; p $atts; p $strings;
sub process_elt
{ my( $t, $elt, $strings, $atts)= @_;
push @$atts, $elt->atts;
my $string= $elt->text;
if( $elt->tag eq 'e1')
{ $string=~ s{text}{modified}; }
push @$strings, $string;
$t->purge;
}
__DATA__
<root>
<e1 att_1="val_1_1" att2= "val_2_1">text content of element 1</e1>
<e1 att_1="val_1_2" att2= "val_2_2">text content of element 2</e1>
<e2 att_3="val_3_1" att2= "val_2_3">element with <sub_elt>sub element</sub_elt> inside</e2>
</root>
Upvotes: 1
Reputation: 241748
I prefer XML::LibXML. Its Reader
doesn't need to keep the whole structure in memory, so it can process large files:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML::Reader;
my $reader = 'XML::LibXML::Reader'->new( location => 'file.xml' );
while ($reader->read) {
if (1 == $reader->depth
and XML_READER_TYPE_ELEMENT == $reader->nodeType
) {
my @info = ($reader->name);
my $inner = $reader->readInnerXml;
for my $idx (0 .. $reader->attributeCount - 1) {
$reader->moveToAttributeNo($idx);
push @info, $reader->name . '=' . $reader->value;
}
push @info, $inner;
print "@info\n";
}
}
Upvotes: 0