cbm64
cbm64

Reputation: 1119

Find and replace characters between XML tags

I have an XML file that is not bound by lines. It has the tags <tag1> and </tag1> that has some trashed variables from the code that generated it (I am not able to correct that right now). I would like to be able to change the characters within these tags to correct them. The characters are sometimes special.

I have this Perl one-liner to show me the contents between the tags, but now I want to be able to replace in the file what it has found.

perl -0777 -ne 'while (/(?<=perform_cnt).*?(?=\<\/perform_cnt)/s) {print $& . "\n";      s/perform_cnt.*?\<\/perform_cnt//s}' output_error.txt

Here's an example of the XML. Notice the junk chars in-between the tags perform_cnt.

<text1>120105728</text1><perform_cnt>ÈPm=</perform_cnt>
<text1>120106394</text1><perform_cnt>†AQ;4K\_Ô23{YYÔ@Nx</perform_cnt>

I need to replace these with like a 0.

Upvotes: 1

Views: 2655

Answers (2)

brian d foy
brian d foy

Reputation: 132802

I love XML::Twig for these sorts of things. It takes a little getting used to, but once you understand the design (and a little about DOM processing), many things become extremely easy:

use XML::Twig;

my $xml = <<'HERE';
<root>
<text1>120105728</text1><perform_cnt>ÈPm=</perform_cnt>
<text1>120106394</text1><perform_cnt>†AQ;4K\_Ô23{YYÔ@Nx</perform_cnt>
</root>
HERE

my $twig = XML::Twig->new(   
    twig_handlers => { 
        perform_cnt   => sub { 
            say "Text is " => $_->text;  # get the current text

            $_->set_text( 'Buster' );    # set the new text
            },
      },
    pretty_print => 'indented',
    );

$twig->parse( $xml );
$twig->flush; 

With indented pretty printing, I get:

<root>
  <text1>120105728</text1>
  <perform_cnt>Buster</perform_cnt>
  <text1>120106394</text1>
  <perform_cnt>Buster</perform_cnt>
</root>

Upvotes: 8

Ωmega
Ωmega

Reputation: 43673

It is a bad practice to use regex for xml parsing

Anyway - the code is:

#!/usr/bin/perl

use strict;
use warnings;

my $tag = 'perform_cnt';

open my $fh, '<file.txt' or die $!;
foreach (<$fh>) {
  s/(<$tag>)(.*?)(<\/$tag>)/$1$3/g;
  print "$_";
}
close $fh;

And output is:

<text1>120105728</text1><perform_cnt></perform_cnt>
<text1>120106394</text1><perform_cnt></perform_cnt>

Upvotes: 0

Related Questions