user1811486
user1811486

Reputation: 1314

How to replace a xml by regex checklist

I have a xml file and some replace check list for replace xml file. How can escape regex and replace that xml file. Just I tried that concept but It can't work perfectly... how can I do this?

I Tried:

Input xml:

<xml>
<p class="text">The <em type="italic">end</em> of the text</p>
<p class="text">The <bold type="strong">end of the</bold> text</p>
<p class="text">The end of <samll type="caps">the<small> text</p>
</xml>

script:

use strict;
open(IN, "xml_file.xml") || die "can't open $!";
my $text = join '', <IN>;
my @ar = '';
my $testing;
foreach my $t (<DATA>){
    @ar = split /\t/, $t;
    chomp($ar[0]);
    chomp($ar[1]);
    $text =~ s/$ar[0]/$ar[1]/segi;
}
print $text;

__END__
<p([^>]+)?> <line>
<small([^>]+)?> <sc$1>
<bold type=\"([^"]+)\"> <strong act=\"$1\">
<(\/)?em([^>]+)?>   <$1emhasis$2>

need output:

<xml>
<line>The <emhasis type="italic">end</emhasis> of the text</line>
<line>The <strong act="strong">end of the</strong> text</line>
<line>The end of <sc type="caps">the<sc> text</line>
</xml>

How can I replace this tag regex as checklist and how can I get value from group pattern..

Upvotes: 0

Views: 149

Answers (2)

Jithin
Jithin

Reputation: 2604

With reference to an old SO post, You need to use double eval substitution.

I can't make it working using <DATA>, but below code will work. You can make the @replace structure as you want, I just created a simple one.

my $text = <<XML;
<xml>
<p class="text">The <em type="italic">end</em> of the text</p>
<p class="text">The <bold type="strong">end of the</bold> text</p>
<p class="text">The end of <small type="caps">the</small> text</p>
</xml>
XML

my @replace = (
    {
        'select' => '<p([^>]+)?>',
        'replace' => '"<line$1>"'
    },
    {
        'select' => '/p>',
        'replace' => '"/line>"'
    },
    {
        'select' => '<small([^>]+)?>',
        'replace' => '"<sc$1>"'
    },
    {
        'select' => '/small>',
        'replace' => '"/sc>"'
    },
    {
        'select' => '<bold\s+type="(.+?)".*?>',
        'replace' => '"<strong act=\"$1\">"'
    },
    {
        'select' => '/bold>',
        'replace' => '"/strong>"'
    },
    {
        'select' => '<em([^>]+)?>',
        'replace' => '"<emhasis$1>"'
    },
    {
        'select' => '/em>',
        'replace' => '"/emhasis>"'
    },
);

map {my $re = $_; $text =~ s/$re->{select}/$re->{replace}/sigee;} @replace;

print $text;

Upvotes: 1

Miguel Prz
Miguel Prz

Reputation: 13792

Simply add:

$ar[0] = qr/$ar[0]/;

just before execute the regexpr substitution;

also, you forgot this pattern:

</p>    </line>

You have a typo in the input xml:

<samll type="caps">

should be

<small type="caps">

And finally, a piece of advice: it's not a good idea parsing XML with regular expressions. I recommend using an XML parser from CPAN, is a better choice (IMO).

Upvotes: 0

Related Questions