user2023370
user2023370

Reputation: 11037

Conditional subexpression replacement using regular expressions

I have text input similar to that shown below. I'd like to add the word auto before each 'a=b' pattern, but only if it is part of a sequence following the keyword kywrd (separated by semicolons).

kywrd a=b;c=d;
e=f;
fnctn z;
g=h;

So the output I'm looking for here is:

kywrd2 auto a=b;auto c=d;
auto e=f;
fnctn z;
g=h;

The Perl6 (Raku?) code below uses a regular expression to add the auto keyword, but only before the first a=b pattern. Is there a simple way to perform the substitution for all patterns in the sequence; leaving g=h; unmodified?

my Str $x = slurp "in.q";
$x ~~ s:g /kywrd\s+(\w+)\=(\w+)\;/kywrd2 auto $0=$1\;/;
spurt "out.q", $x;

Upvotes: 7

Views: 332

Answers (3)

raiph
raiph

Reputation: 32414

One way:

# Create a separate named regex that captures an `x=y;` pair:
my regex pair { (\w+) \= (\w+) \; (\s*) }
# (Capture `(\s*)` so formatting between pairs is retained)

# Generate and return 'auto'-ized replacement of a captured pair: 
sub auto-ize ($/) { "auto $0=$1;$2" }

$x ~~ s:g { kywrd \s+ <pair>+ } = "kywrd2 $<pair>».&auto-ize.join()";

All the code I've shown would be simple to understand for someone a little familiar with Raku but I'll explain it anyway.

  • I've broken out a named regex to match a pair. (See my answer to Difference in capturing and non-capturing regex scope in Perl 6 / Raku for details about why/how <pair> calls the pair regex.)

  • The auto-ize sub routine uses the match variable ($/) as its argument. This is convenient because $0 etc. are then automatically aliased to the numbered captures associated with the passed match object.

  • I've used syntax of the form s [ ... ] = " ... " because I think it's more readable for this use case. (See mention of "different delimiters" in s/// doc.)

  • The "kywrd2 ..." string will be repeatedly evaluated and become a replacement of a match, once for each match of the multiple s:g matches.

  • The $<pair>».&auto-ize.join() bit is code being interpolated under double quoted string rules.

  • $<pair> is short for $/<pair>, i.e. the <pair> key of $/. It refers to the pair named capture associated with the match variable. The latter will correspond to each match of the multiple s:g matches in turn.

  • The + quantifier in the regex expression <pair>+ means that, if it matches, it produces a List of capture (match) objects rather than just one (as would be the case if the expression was instead just <pair> or <pair>?).

  • » treats its LHS operand as a tree or list (in this case a list of one or more capture/match objects, one per foo=bar;... pair) and walks over its elements. For each "leaf" element the » does the operation on its right. (» is a powerful operator but has nice simple use cases such as this one where it's just a notationally convenient and compact equivalent of a for loop. You can write it as >> if you prefer ASCII.)

  • .&auto-ize calls the auto-ize sub routine as if it were a method, using the operand to its left as the first argument.

The test input data from @PolarBear's answer:

kywrd a=b;c=d;
e=f;
fnctn z;
g=h;
k=m;
fnctn y;
kywrd m=n;
k=j;
kywrd z=a;b=i;
kywrd c=x;e=i;
z=q;
fnctn o;

Putting that into in.q and saying the resulting out.q displays:

kywrd2 auto a=b;auto c=d;
auto e=f;
fnctn z;
g=h;
k=m;
fnctn y;
kywrd2 auto m=n;
auto k=j;
kywrd2 auto z=a;auto b=i;
kywrd2 auto c=x;auto e=i;
auto z=q;
fnctn o;

Upvotes: 5

Polar Bear
Polar Bear

Reputation: 6798

Not very elegant but workable code (ancient way)

#!/usr/bin/perl

use strict;
use warnings;

OUTER: while(<DATA>) {
    if( s/kywrd /kywrd2 / ) {
        do {
            if( ! s/(\w+)=(\w+)/auto $1=$2/g ) {
                print;
                next OUTER;
            }
            print;
        } while ( <DATA> );
    } else {
        print;
    }
}

__DATA__
kywrd a=b;c=d;
e=f;
fnctn z;
g=h;
k=m;
fnctn y;
kywrd m=n;
k=j;
kywrd z=a;b=i;
kywrd c=x;e=i;
z=q;
fnctn o;

I need to look at Raku - what kind of animal it is.

Upvotes: 3

Holli
Holli

Reputation: 5072

One possible way that keeps the regexing to a minimum:

sub repl ($input) 
{ 
    $input.Str
    .split(';', :skip-empty)
    .map( 'auto ' ~ * ~ ';')
    .join('')
 };

 my $foo = 'kywrd a=b;c=d;d=e;'; 
 $foo ~~ s:g /kywrd \s+ (.+)/kywrds2 { repl($0) }/; 
 $foo.say;

Personally I'd prefer the method form subst over the s// operator though.

$foo .= subst(/ kywrd \s+ (.+) /, "kywrds2 { repl($0) }", :g); 

Upvotes: 5

Related Questions