Reputation: 1119
I have an XML file that is not bound by lines. It has the tags <tag1>
and </tag1>
that has some trashed variables from the code that generated it (I am not able to correct that right now). I would like to be able to change the characters within these tags to correct them. The characters are sometimes special.
I have this Perl one-liner to show me the contents between the tags, but now I want to be able to replace in the file what it has found.
perl -0777 -ne 'while (/(?<=perform_cnt).*?(?=\<\/perform_cnt)/s) {print $& . "\n"; s/perform_cnt.*?\<\/perform_cnt//s}' output_error.txt
Here's an example of the XML. Notice the junk chars in-between the tags perform_cnt
.
<text1>120105728</text1><perform_cnt>ÈPm=</perform_cnt>
<text1>120106394</text1><perform_cnt>†AQ;4K\_Ô23{YYÔ@Nx</perform_cnt>
I need to replace these with like a 0.
Upvotes: 1
Views: 2655
Reputation: 132802
I love XML::Twig for these sorts of things. It takes a little getting used to, but once you understand the design (and a little about DOM processing), many things become extremely easy:
use XML::Twig;
my $xml = <<'HERE';
<root>
<text1>120105728</text1><perform_cnt>ÈPm=</perform_cnt>
<text1>120106394</text1><perform_cnt>†AQ;4K\_Ô23{YYÔ@Nx</perform_cnt>
</root>
HERE
my $twig = XML::Twig->new(
twig_handlers => {
perform_cnt => sub {
say "Text is " => $_->text; # get the current text
$_->set_text( 'Buster' ); # set the new text
},
},
pretty_print => 'indented',
);
$twig->parse( $xml );
$twig->flush;
With indented pretty printing, I get:
<root>
<text1>120105728</text1>
<perform_cnt>Buster</perform_cnt>
<text1>120106394</text1>
<perform_cnt>Buster</perform_cnt>
</root>
Upvotes: 8
Reputation: 43673
Anyway - the code is:
#!/usr/bin/perl
use strict;
use warnings;
my $tag = 'perform_cnt';
open my $fh, '<file.txt' or die $!;
foreach (<$fh>) {
s/(<$tag>)(.*?)(<\/$tag>)/$1$3/g;
print "$_";
}
close $fh;
And output is:
<text1>120105728</text1><perform_cnt></perform_cnt>
<text1>120106394</text1><perform_cnt></perform_cnt>
Upvotes: 0