Daniel Mita
Daniel Mita

Reputation: 642

Is it possible to have a capture within an interpolated regex?

I wanted to generate regex from an existing list of values, but when I attempted to use a capture within it, the capture was not present in the match. Is it not possible to have a capture using interpolation, or am I doing something wrong?

my @keys = <foo bar baz>;
my $test-pattern = @keys.map({ "<$_>" }).join(' || ');

grammar Demo1 {
  token TOP {
    [
      || <foo>
      || <bar>
      || <baz>
    ] ** 1..* % \s+
  }

  token foo { 1 }
  token bar { 2 }
  token baz { 3 }
}

grammar Demo2 {
  token TOP {
    [ <$test-pattern> ] ** 1..* % \s+
  }

  token foo { 1 }
  token bar { 2 }
  token baz { 3 }
}

say $test-pattern, "\n" x 2, Demo1.parse('1 2 3'), "\n" x 2, Demo2.parse('1 2 3');
<foo> || <bar> || <baz>

「1 2 3」
 foo => 「1」
 bar => 「2」
 baz => 「3」

「1 2 3」

Upvotes: 8

Views: 107

Answers (1)

raiph
raiph

Reputation: 32404

The rule for determining whether an atom of the form <...> captures without further ado is whether or not it starts with a letter or underscore.

If an assertion starts with a letter or underscore, then an identifier is expected/parsed and a match is captured using that identifier as the key in the enclosing match object. For example, <foo::baz-bar qux> begins with a letter and captures under the key foo::baz-bar.

If an assertion does not begin with a letter or underscore, then by default it does not capture.


To capture the results of an assertion whose first character is not a letter or underscore you can either put it in parens or name it:

( <$test-pattern> ) ** 1..* % \s+

or, to name the assertion:

<test-pattern=$test-pattern> ** 1..* % \s+

or (just another way to have the same naming effect):

$<test-pattern>=<$test-pattern> ** 1..* % \s+

If all you do is put an otherwise non-capturing assertion in parens, then you have not switched capturing on for that assertion. Instead, you've merely wrapped it in an outer capture. The assertion remains non-capturing, and any sub-capture data of the non-capturing assertion is thrown away.

Thus the output of the first solution shown above (wrapping the <$test-pattern> assertion in parens) is:

「1 2 3」
 0 => 「1」
 0 => 「2」
 0 => 「3」

Sometimes that's what you'll want to simplify the parse tree and/or save memory.

In contrast, if you name an otherwise non-capturing assertion with either of the named forms shown above, then by doing so you convert it into a capturing assertion, which means any sub capture detail will be retained. Thus the named solutions produce:

「1 2 3」
 test-pattern => 「1」
  foo => 「1」
 test-pattern => 「2」
  bar => 「2」
 test-pattern => 「3」
  baz => 「3」

Upvotes: 6

Related Questions