ssr1012
ssr1012

Reputation: 2589

split the parent element if the keyword found in xml file using perl

Question Updated

I have a keyword tag <split/> in the xml file. Base on this I need to split the elements to be closed which has been opened and Also I need to open DUMMY OPENING TAGS which we are adding the closing tags on the keyword elements.

For eg. Input:

<section>
   <para> The para sample lines...
      <list>
     <list-item><para> ..... .... </para></list-item>
     <list-item><para> ..... .... </para></list-item>
     <list-item><para> ..... <split/> .... </para></list-item>
      </list>
     The para sample lines.. </para>
</section>

Expected Output:

<section>
   <para> The para sample lines...
      <list>
     <list-item><para> ..... .... </para></list-item>
     <list-item><para> ..... .... </para></list-item>
     <list-item><para> ..... </para></list-item>
      </list>
   </para>
</section>
*<split/>*
<section> <!--dummy tag-->
   <para><!--dummy tag-->
      <list><!--dummy tag-->
     <list-item><para><!--dummy tag--> <split/> .... </para></list-item>
      </list>
      The para sample lines.. </para>
</section>

Note: Asterisks for just identification purpose only (need to delete the tag)

I am very new in using Module based on the Markup Languages. Could someone help me to get the idea. (I am also trying...)

Upvotes: 1

Views: 67

Answers (1)

Sobrique
Sobrique

Reputation: 53478

Here's an example of how you could do this using XML::Twig:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;

my $first_doc = XML::Twig -> parse ( \*DATA ); 

my $second_doc = XML::Twig -> new; 
$second_doc -> set_root ( $first_doc -> root -> copy ); #create a copy. 

while ( my $after_split = $first_doc -> get_xpath('//split',0)->next_sibling ) {
   $after_split -> delete;
}

$first_doc -> get_xpath('//split',0) -> delete; # delete split tag.

while ( my $before_split = $second_doc -> get_xpath('//split',0)->prev_sibling ) {
   $before_split -> delete;
}

$second_doc -> get_xpath('//split',0) -> delete; # delete split tag. 

$first_doc -> set_pretty_print ('indented_a');
$first_doc -> print;

print "\n--- second doc ---\n"; 
$second_doc -> set_pretty_print ('indented_a');
$second_doc -> print;


__DATA__
<section>
   <para>
      <list>
      <list-item><para> sample content for first doc <split/> second doc sample content </para></list-item>

      </list>
   </para>
</section>

This gives you as output:

<section>
  <para>
    <list>
      <list-item>
        <para> sample content for first doc </para>
      </list-item>
    </list>
  </para>
</section>

--- second doc ---
<section>
  <para>
    <list>
      <list-item>
        <para> second doc sample content </para>
      </list-item>
    </list>
  </para>
</section>

You will probably want to look at parsefile and sprint from XML::Twig to handle reading your own file, and generating output.

Note - this does a 'full split' of the document into essentially two separate documents - but this technique should work withing a subtree, because the core of it is locating your split element, and deleting everything before or after it as necessary.

Upvotes: 2

Related Questions