Reputation: 2589
Find the Level of Sections in XML Structured Document - perl Input:
<section>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section>
<para>...level 2</para>
<para>...level 2</para>
<section>
<para>...level 3</para>
<para>...level 3</para>
<para>...level 3</para>
</section>
<para>...level 2</para>
</section>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
</section>
<section>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
</section>
I need to fetch all the section level elements and insert the value based on the levels. And the desired output like as follows:
<section1>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<section3>
<para>...level 3</para>
<para>...level 3</para>
<para>...level 3</para>
</section3>
<para>...level 2</para>
</section2>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
</section1>
<section1>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
</section1>
First try:
foreach my $lines ( @splitCnt ) {
if ( $lines =~ m/<section\s+/g ) {
$opn++;
$lines =~ s/<section\s+/<section$opn /i;
$cls = $opn;
$opn++;
}
elsif ( $lines =~ m/<\/section>/g ) {
$opn = $opn - 1;
$lines =~ s/<\/section>/<\/section$opn>/i;
}
$all_lines .= "$lines\n";
}
Second Try:
my ( $pre1, $match1, $post1 ) = "";
while ( $incnt =~ m/<section\s+[^>]*>/g ) {
$pre1 = $`;
$match1 = $&;
$post1 = $';
my $Opn = '1';
my $Cls = "";
$match1 =~ s/<section\s+/<section$Opn /gi;
if ( $post1 =~ m/<section\s+/i ) {
$Opn++;
$post1 =~ s/<section\s+/<section$Opn /;
$Opn = $Cls;
}
elsif ( $post1 =~ m/<\/section>/i ) {
$post1 =~ s/<\/section/<\/section$Cls/;
}
$pre1 .= $match1;
$incnt = $post1;
print "$pre1\n";
system 'pause';
}
if ( length $pre1 ) {
$incnt = $pre1 . $post1;
}
Anyone could help on this one...
Upvotes: 2
Views: 65
Reputation: 126742
Here's a variant using the XML::LibXML
module. It simply finds all the section
elements and caclulates their hierarchy by counting the number of slashes in the XPath expression to reach them
However, as others have said, this is a strange thing to want to do, and it very much sounds like its a bad solution to a different problem. If you explained the complete problem then we could help you a lot better
use strict;
use warnings;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(IO => \*DATA);
for my $section ( $doc->findnodes('//section') ) {
my $n = $section->nodePath =~ tr|/|| - 1;
$section->setNodeName("section$n");
}
print $doc;
__DATA__
<root>
<section>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section>
<para>...level 2</para>
<para>...level 2</para>
<section>
<para>...level 3</para>
<para>...level 3</para>
<para>...level 3</para>
</section>
<para>...level 2</para>
</section>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
</section>
<section>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
</section>
</root>
<?xml version="1.0"?>
<root>
<section1>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<section3>
<para>...level 3</para>
<para>...level 3</para>
<para>...level 3</para>
</section3>
<para>...level 2</para>
</section2>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
</section1>
<section1>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
</section1>
</root>
Upvotes: 2
Reputation: 53498
Seriously - don't use regular expresssions for XML. It's bad news. There's a bunch of perfectly valid things you can do with XML that break regular expressions - so what you get is broken XML, and brittle code that may well break horribly one day, and no one will know quite why.
Use a parser. Personally - I like XML::Twig
It's quite easy to take and rename tags as you ask:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
sub process_section {
my ( $section, $depth ) = @_;
$depth++;
$section->set_tag("section$depth");
foreach my $subsection ( $section->children('section') ) {
process_section( $subsection, $depth );
}
}
my $twig = XML::Twig->new( 'pretty_print' => 'indented_a' );
$twig->parsefile ( 'your_file.xml' );
foreach my $section ( $twig->findnodes('section') ) {
process_section( $section, 0 );
}
$twig->print;
I would also point out though - your initial question sounds like an XY problem. What are you trying to accomplish? It's often undesirable to do this sort of operation - changing tags based on hierarchy, because then ... well, then you couldn't do what I've just done - iterate recursively through the data structure.
Upvotes: 4