Reputation: 41
I'm using XML::Twig module to remove all the comments from an XML file. The sample file can be -
<?xml version="1.0" encoding="UTF-8"?>
<Node_A>
node A content 1
<!-- One Line Comment A1-->
<![CDATA[this portion within the two comments is being
REMOVED which is not the intention]]>
<!-- Two Line Comment
Two Line Comment-->
node A content 3
<!-- Two Line Comment
Two Line Comment-->
<![CDATA[this portion within the two comments is being
REMOVED which is not the intention]]>
<!-- Two Line Comment
Two Line Comment-->
<![CDATA[
this portion is fine]]>
<Node_B> node B content
<Node_C> node c content
</Node_C>
<!-- One Line Comment -->
some data one
<!-- Multi Line Comment
Line 3Comment
1Line Comment
2Line Comment
Line 5Comment
Line Comment-->
some data again two
<!-- Multi Line Comment
Line 3Comment
Line 5Comment
Line Comment-->
few more
</Node_B>
</Node_A>
I have used the script like -
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $infile = 'demo.xml';
my $twig = XML::Twig->new (comments => 'drop', pretty_print => 'indented')->parsefile($infile);
$twig->print ();
This script is removing the "CDATA" portion within the two comments which is not my intention. The output is coming as-
<?xml version="1.0" encoding="UTF-8"?>
<Node_A>
node A content 1
<![CDATA[
this portion is fine]]><Node_B> node B content
<Node_C> node c content
</Node_C>
some data one
some data again two
few more
</Node_B></Node_A>
What I have to add to keep all the CDATA portion and other stuff as it is, just to remove the comments?
Thanks in advance.
Upvotes: 4
Views: 961
Reputation: 62109
When I run your script with the demo.xml file you posted, I get the output:
<?xml version="1.0" encoding="UTF-8"?>
<Node_A>
node A content 1
<![CDATA[this portion within the two comments is being
REMOVED which is not the intention]]>
node A content 3
<![CDATA[this portion within the two comments is being
REMOVED which is not the intention]]><![CDATA[
this portion is fine]]><Node_B> node B content
<Node_C> node c content
</Node_C>
some data one
some data again two
few more
</Node_B></Node_A>
Which looks ok to me. I suspect you have a buggy version of XML::Twig (or XML::Parser, which it depends on). I'm using Perl 5.14.2, XML::Twig 3.35, and XML::Parser 2.41.
Upvotes: 4