Simon2233
Simon2233

Reputation: 113

read contents of all xml tags that start with a certain string

I have an xml structure where I'm looking to write a perl script that reads the content of all tags that start with a certain string.

Example:

<tag-0>
    <tag-1>This is<tag-2>some example</tag2>text</tag-1>
    <tag-3>This is some <ice-8> more </ice-8>text</tag-3>
    <tag-4>This 
        <tag-5>is 
            <tag-6>even more</tag-6>
        </tag-5> 
        <tag-7> text</tag-7>
    </tag-4>
</tag-0>

The purpose of the script is to find all nodes that start with <tag-[num]> that contain a nested <tag-[num]>. I'm not familiar with perl, so how would I go about reading contents of a "dynamic" tag, and checking for more dynamic nesting tags?

In the above example, I would want to get tag-0, tag-1, tag-4, and tag-5, which I would then be able to further manipulate their contents.

Upvotes: 1

Views: 248

Answers (2)

haukex
haukex

Reputation: 3013

XML::LibXML is my most used XML module - there are plenty others, but this one does just about everything I need, at the expense of sometimes being a little more verbose than other modules. The following prints the four desired nodes:

use warnings;
use strict;
use XML::LibXML;

my $dom = XML::LibXML->load_xml(string => <<'EOT');
<tag-0>
    <tag-1>This is<tag-2>some example</tag-2>text</tag-1>
    <tag-3>This is some <ice-8> more </ice-8>text</tag-3>
    <tag-4>This 
        <tag-5>is 
            <tag-6>even more</tag-6>
        </tag-5> 
        <tag-7> text</tag-7>
    </tag-4>
</tag-0>
EOT

my $expr = "*[substring(name(), 1, 4) = 'tag-']";
for my $node ( $dom->findnodes("//$expr") ) {
    my @children = $node->findnodes("./$expr");
    if (@children) {
        print $node->nodeName,"\n";
    }
}

Note that your problem description is a little unclear: does "contain a nested <tag-[num]>" mean that only direct descendants are to be considered, or should <tag-0>A<x>B<tag-1>C</tag-1>D</x>E</tag-0> also return tag-0?

If so, then you can change the second findnodes expression to ".//$expr".

Upvotes: 2

Grinnz
Grinnz

Reputation: 9231

Using Mojo::DOM:

use strict;
use warnings;
use Mojo::DOM;

my $dom = Mojo::DOM->new->xml(1)->parse($xml);

my @tags_with_subtags = $dom->find('*')->grep(sub {
  $_->tag =~ m/\Atag-[0-9]+\z/ and $_->find('*')->grep(sub {
    $_->tag =~ m/\Atag-[0-9]+\z/
  })->size
})->each;

Each of the results is a Mojo::DOM object you can further search or manipulate. CSS unfortunately is not (as far as I know) well suited for finding dynamic tag names, so you have to do this bit yourself; it would be very easy if it was instead dynamic attributes.

Upvotes: 1

Related Questions