Reputation: 113
I have an xml structure where I'm looking to write a perl script that reads the content of all tags that start with a certain string.
Example:
<tag-0>
<tag-1>This is<tag-2>some example</tag2>text</tag-1>
<tag-3>This is some <ice-8> more </ice-8>text</tag-3>
<tag-4>This
<tag-5>is
<tag-6>even more</tag-6>
</tag-5>
<tag-7> text</tag-7>
</tag-4>
</tag-0>
The purpose of the script is to find all nodes that start with <tag-[num]>
that contain a nested <tag-[num]>
. I'm not familiar with perl, so how would I go about reading contents of a "dynamic" tag, and checking for more dynamic nesting tags?
In the above example, I would want to get tag-0, tag-1, tag-4, and tag-5, which I would then be able to further manipulate their contents.
Upvotes: 1
Views: 248
Reputation: 3013
XML::LibXML
is my most used XML module - there are plenty others, but this one does just about everything I need, at the expense of sometimes being a little more verbose than other modules. The following prints the four desired nodes:
use warnings;
use strict;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(string => <<'EOT');
<tag-0>
<tag-1>This is<tag-2>some example</tag-2>text</tag-1>
<tag-3>This is some <ice-8> more </ice-8>text</tag-3>
<tag-4>This
<tag-5>is
<tag-6>even more</tag-6>
</tag-5>
<tag-7> text</tag-7>
</tag-4>
</tag-0>
EOT
my $expr = "*[substring(name(), 1, 4) = 'tag-']";
for my $node ( $dom->findnodes("//$expr") ) {
my @children = $node->findnodes("./$expr");
if (@children) {
print $node->nodeName,"\n";
}
}
Note that your problem description is a little unclear: does "contain a nested <tag-[num]>
" mean that only direct descendants are to be considered, or should <tag-0>A<x>B<tag-1>C</tag-1>D</x>E</tag-0>
also return tag-0
?
If so, then you can change the second findnodes
expression to ".//$expr"
.
Upvotes: 2
Reputation: 9231
Using Mojo::DOM:
use strict;
use warnings;
use Mojo::DOM;
my $dom = Mojo::DOM->new->xml(1)->parse($xml);
my @tags_with_subtags = $dom->find('*')->grep(sub {
$_->tag =~ m/\Atag-[0-9]+\z/ and $_->find('*')->grep(sub {
$_->tag =~ m/\Atag-[0-9]+\z/
})->size
})->each;
Each of the results is a Mojo::DOM object you can further search or manipulate. CSS unfortunately is not (as far as I know) well suited for finding dynamic tag names, so you have to do this bit yourself; it would be very easy if it was instead dynamic attributes.
Upvotes: 1