Reputation: 642
I wanted to generate regex from an existing list of values, but when I attempted to use a capture within it, the capture was not present in the match. Is it not possible to have a capture using interpolation, or am I doing something wrong?
my @keys = <foo bar baz>;
my $test-pattern = @keys.map({ "<$_>" }).join(' || ');
grammar Demo1 {
token TOP {
[
|| <foo>
|| <bar>
|| <baz>
] ** 1..* % \s+
}
token foo { 1 }
token bar { 2 }
token baz { 3 }
}
grammar Demo2 {
token TOP {
[ <$test-pattern> ] ** 1..* % \s+
}
token foo { 1 }
token bar { 2 }
token baz { 3 }
}
say $test-pattern, "\n" x 2, Demo1.parse('1 2 3'), "\n" x 2, Demo2.parse('1 2 3');
<foo> || <bar> || <baz>
「1 2 3」
foo => 「1」
bar => 「2」
baz => 「3」
「1 2 3」
Upvotes: 8
Views: 107
Reputation: 32404
The rule for determining whether an atom of the form <...>
captures without further ado is whether or not it starts with a letter or underscore.
If an assertion starts with a letter or underscore, then an identifier is expected/parsed and a match is captured using that identifier as the key in the enclosing match object. For example, <foo::baz-bar qux>
begins with a letter and captures under the key foo::baz-bar
.
If an assertion does not begin with a letter or underscore, then by default it does not capture.
To capture the results of an assertion whose first character is not a letter or underscore you can either put it in parens or name it:
( <$test-pattern> ) ** 1..* % \s+
or, to name the assertion:
<test-pattern=$test-pattern> ** 1..* % \s+
or (just another way to have the same naming effect):
$<test-pattern>=<$test-pattern> ** 1..* % \s+
If all you do is put an otherwise non-capturing assertion in parens, then you have not switched capturing on for that assertion. Instead, you've merely wrapped it in an outer capture. The assertion remains non-capturing, and any sub-capture data of the non-capturing assertion is thrown away.
Thus the output of the first solution shown above (wrapping the <$test-pattern>
assertion in parens) is:
「1 2 3」
0 => 「1」
0 => 「2」
0 => 「3」
Sometimes that's what you'll want to simplify the parse tree and/or save memory.
In contrast, if you name an otherwise non-capturing assertion with either of the named forms shown above, then by doing so you convert it into a capturing assertion, which means any sub capture detail will be retained. Thus the named solutions produce:
「1 2 3」
test-pattern => 「1」
foo => 「1」
test-pattern => 「2」
bar => 「2」
test-pattern => 「3」
baz => 「3」
Upvotes: 6