Is it possible to have a capture within an interpolated regex?

Question

I wanted to generate regex from an existing list of values, but when I attempted to use a capture within it, the capture was not present in the match. Is it not possible to have a capture using interpolation, or am I doing something wrong?

my @keys = ;
my $test-pattern = @keys.map({ "<$_>" }).join(' || ');

grammar Demo1 {
  token TOP {
    [
      || 
      || 
      || 
    ] ** 1..* % \s+
  }

  token foo { 1 }
  token bar { 2 }
  token baz { 3 }
}

grammar Demo2 {
  token TOP {
    [ <$test-pattern> ] ** 1..* % \s+
  }

  token foo { 1 }
  token bar { 2 }
  token baz { 3 }
}

say $test-pattern, "
" x 2, Demo1.parse('1 2 3'), "
" x 2, Demo2.parse('1 2 3');

 ||  || 

｢1 2 3｣
 foo => ｢1｣
 bar => ｢2｣
 baz => ｢3｣

｢1 2 3｣

raiph · Accepted Answer

The rule for determining whether an atom of the form <...> captures without further ado is whether or not it starts with a letter or underscore.

If an assertion starts with a letter or underscore, then an identifier is expected/parsed and a match is captured using that identifier as the key in the enclosing match object. For example, begins with a letter and captures under the key foo::baz-bar.

If an assertion does not begin with a letter or underscore, then by default it does not capture.

To capture the results of an assertion whose first character is not a letter or underscore you can either put it in parens or name it:

( <$test-pattern> ) ** 1..* % \s+

or, to name the assertion:

 ** 1..* % \s+

or (just another way to have the same naming effect):

$=<$test-pattern> ** 1..* % \s+

If all you do is put an otherwise non-capturing assertion in parens, then you have not switched capturing on for that assertion. Instead, you've merely wrapped it in an outer capture. The assertion remains non-capturing, and any sub-capture data of the non-capturing assertion is thrown away.

Thus the output of the first solution shown above (wrapping the <$test-pattern> assertion in parens) is:

｢1 2 3｣
 0 => ｢1｣
 0 => ｢2｣
 0 => ｢3｣

Sometimes that's what you'll want to simplify the parse tree and/or save memory.

In contrast, if you name an otherwise non-capturing assertion with either of the named forms shown above, then by doing so you convert it into a capturing assertion, which means any sub capture detail will be retained. Thus the named solutions produce:

｢1 2 3｣
 test-pattern => ｢1｣
  foo => ｢1｣
 test-pattern => ｢2｣
  bar => ｢2｣
 test-pattern => ｢3｣
  baz => ｢3｣

Is it possible to have a capture within an interpolated regex?

Answers (1)

Related Questions