Rajkumar
Rajkumar

Reputation: 3

Need to finding HTML/XML nested tag levels using Perl

Is their any simple way to find the level of the tag in nested form, i.e. no. of parent element with same tag name.

Note: I'm planning to create subroutine that if I pass a scalar like below input, it should return output like below as a scalar.

I need output like below from the input using Perl.

Input:

<sec>
  <sec></sec>
  <sec>
    <sec></sec>
  </sec>
</sec>

Output should be:

<sec level="1">
  <sec level="2"></sec>
  <sec level="2">
    <sec level="3"></sec>
  </sec>
</sec>

Upvotes: 0

Views: 107

Answers (1)

Shawn
Shawn

Reputation: 52529

One approach, that uses XML::LibXML to generate a DOM tree from the XML, and then walks the tree adding an incrementing level attribute to matching tags:

#!/usr/bin/env perl
use warnings;
use strict;
use XML::LibXML;

# Recursively walk a DOM tree, and invoke callbacks on elements
sub walk_elements {
    my ($node, $callbacks) = @_;
    $callbacks->{pre}->($node) if $node->nodeType == XML_ELEMENT_NODE;
    for my $child ($node->childNodes) {
        walk_elements($child, $callbacks);
    }
    $callbacks->{post}->($node) if $node->nodeType == XML_ELEMENT_NODE;
}

sub add_levels {
    my ($xml, $tagname) = @_;
    my $dom = XML::LibXML->load_xml(string => $xml);
    my $level = 1;
    walk_elements($dom->getDocumentElement,
                { pre => sub {
                    $_[0]->setAttribute('level', $level++)
                        if $_[0]->nodeName eq $tagname
                  },
                  post => sub { $level-- if $_[0]->nodeName eq $tagname }
                }
        );
    return $dom->toStringHTML; # Or toString for XML style tags
}

my $xml = <<EOXML;
<sec>
  <sec></sec>
  <sec>
    <sec></sec>
  </sec>
</sec>
EOXML

print add_levels($xml, 'sec');

Running this script outputs

<sec level="1">
  <sec level="2"></sec>
  <sec level="2">
    <sec level="3"></sec>
  </sec>
</sec>

Upvotes: 2

Related Questions