nachum
nachum

Reputation: 567

perl regex replace outside of string only

I have strings that I need to find variables inside of in order to replace them with values. Eg:

my $str = "var1 var2 blah blah blah var3"

Sometimes the strings have embedded strings:

my $str = "var1 var2 blah \"do not replace this: var1\" blah blah var3"

So I built a regex that matches strings and variables. When it matches a string, it replaces it with itself. When it matches variables, it replaces them with the results of a hash. In order to make this work in regex form, I break the captures into two parts, the named group (macro) and the last match. For strings, I capture the first quote character (") into the named group and the rest of the string into the last match. For variables, I capture the whole variable in the named group and I capture nothing in the last capture group. To handle strings, I add a hash entry for {"} = '"'. For each match, I paste the hash lookup followed by the last match. This performs admirably - although seems awkward.

$line =~ s/(?:(?<macro>(?<!\\)")(.*?(?<!\\)")|(?<macro>(``|\b($list_of_hash_keys)\b))())/$variables->{$+{macro}}$+/gs;

Is there a cleaner way in a beautiful regex form?

Upvotes: 1

Views: 111

Answers (3)

nachum
nachum

Reputation: 567

Answer for this is (*SKIP)(*FAIL). What I needed to do was match the string followed by (*SKIP)(*FAIL), and that would dispose of it.

Upvotes: 0

Toto
Toto

Reputation: 91528

use Modern::Perl;

my @in = (
"var1 var2 blah blah blah var3",
"var1 var2 blah \"do not replace this: var1\" blah blah var3",
);
my $variables = {
    var1 => "mod1",
    var2 => "mod2",
    var3 => "mod3",
    var4 => "mod4",
};
my $list_of_hash_keys = '\b(' . join('|',keys(%$variables)) . ')\b';
for (@in) {
    s/"[^"]+"(*SKIP)(*FAIL)|$list_of_hash_keys/$variables->{$1}/g;
    say
}

Output:

mod1 mod2 blah blah blah mod3
mod1 mod2 blah "do not replace this: var1" blah blah mod3

Explanation:

"                       # quote
[^"]+                   # 1 or more non quote
"                       # quote
(*SKIP)                 # skip everything that's been matching (i.e. everything between quotes)
(*FAIL)                 # fail the match
  |                       # OR
$list_of_hash_keys      # list of keys to match, captured in group 1

Upvotes: 0

gugod
gugod

Reputation: 830

It appears you're trying to implement a mini templating mechanism.... :)

I'm not sure if the following is beautiful, but here's my approach:

my $out = $str =~ s{
        (?<str> " [^"]+ " ) |
        (?<macro> \b $list_of_hash_keys \b)
    }{
        $+{str} // $variables->{$+{macro}}
    }gsxre;

As you can see, "/e" modifier is used. It is helpful in this case to get rid of the special item '"' in the $variable stash.

The ?<str> captures embedded string, assuming no nested escape sequence inside. I did not test it fully but I don't think this approach is equivlent to yours, nor do I know if it handles all edge cases properly.

But I think this should be enough to demonstrate the idea.

Upvotes: 1

Related Questions