Reputation: 979
I'm trying to write a PEG for an old file format that has about 100 keywords which can't be used as identifiers.
Here's an example of a keyword rule:
IN = { ^"in" } // Caret means case insensitivity
keyword = { IN } // plus others
The identifier rule looks like this:
identifier = @{ ( "_" | ASCII_ALPHA ) ~ ASCII_ALPHANUMERIC* }
Currently this identifier rule will match all the keywords. So the identifier rule becomes:
identifier = @{ !keyword ~ ( "_" | ASCII_ALPHA ) ~ ASCII_ALPHANUMERIC* }
This kind of works, except when an identifier begins with the same letters as a keyword. For example, the identifier inner
is treated as the keyword in
followed by text.
How to allow identifiers that begin with keywords? Note that in the PEST parser generator, terminals can only be specified as strings, not regex.
Upvotes: 4
Views: 480
Reputation: 20559
You can force keyword
to only match full words by using a predicate. For example:
identifier_start = _{ "_" | ASCII_ALPHA }
identifier_continue = _{ "_" | ASCII_ALPHANUMERIC }
keyword = @{ (^"for" | ^"in") ~ !identifier_continue }
identifier = @{ !keyword ~ identifier_start ~ identifier_continue* ~ !identifier_continue }
This will match for
and in
, but not form
or int
.
Upvotes: 4