CyberDude
CyberDude

Reputation: 26

Replacing the regex using perl

I am trying to replace the regex using perl. I have used sed in perl but however, it doesn't seem to work.

Sample lines to be replaced in a file trans.xml

'fairness' and 'efficiency’

I need to replace 'efficiency’ as ‘efficiency’

I tried the below code,

system "sed -e 's/\&\#x0027\;\([a-zA-Z0-9 _]*\)\&\#x2019\;/tooch&/g' trans.xml > tmp.xml";
system "sed -e 's/tooch\&\#x0027\;/\&\#x2018\;/g' tmp.xml > trans.xml"

The above sed commands works manually but not from inside the Perl.

Any help would be greatly appreciated !!

Upvotes: 0

Views: 158

Answers (2)

Kent Fredric
Kent Fredric

Reputation: 57354

A few serious problems:

  1. Why are you calling sed? Sure, maybe IO is harder to do in perl, but perl has regexp's inbuilt.

    use Path::Tiny qw(path);
    my $content = path('trans.xml')->slurp;
    $content =~ s/bar/baz/g;
    $content =~ s/foo/bar/g;
    path('trans.xml')->spew( $content );
    

    note: If trans.xml is UTF-8 encoded, all you have to do here is replace slurp/spew with slurp_utf8/spew_utf8. VS sed, which may be ignorant of unicode.

  2. system with a string should be avoided where possible, because of many reasons, one is the problem you've experienced: Quoting is hard.

    system('sed', '-e', $regexp )
    

    Is preferred syntax where ever possible. Note you can't use this in conjunction with redirection, but you really don't need to.

  3. multiple calls to sed not needed:

    sed 's/foo/bar/g;s/bar/baz/g'
    

    this will apply both expressions.

  4. Once #3 is realised, the temporary file is not required:

    sed -i 's/foo/bar/g;s/bar/baz/g' $file
    

    this will modify $file IN PLACE

  5. When using system, you probably want to check the return value.

Upvotes: 0

JB.
JB.

Reputation: 42094

You're a victim of the double quotes.

Replacing your system call with say will show you more clearly what's going on:

sed -e 's/'([a-zA-Z0-9 _]*)’/tooch&/g' trans.xml > tmp.xml
sed -e 's/tooch'/‘/g' tmp.xml > trans.xml

See what's wrong? There are no backslashes left. They've been interpreted by the Perl double quotes, and are not there for sed to use.

Your case is a bit tricky to correct, since you already use (and need) the single quotes to pass to sed. You could theoretically escape what's needed one more time, but that's error-prone. It's much better to use Perl's other single-quoting facilities:

system q+sed -e 's/\&\#x0027\;\([a-zA-Z0-9 _]*\)\&\#x2019\;/tooch&/g' trans.xml > tmp.xml+;
system q(sed -e 's/tooch\&\#x0027\;/\&\#x2018\;/g' tmp.xml > trans.xml);

I used + as a separator on the first line because it happened not to be used in the string itself. I used plain parentheses in the second line because they were 100% unambiguous there.

Upvotes: 1

Related Questions