oorst
oorst

Reputation: 979

Ambiguity of PEG grammar with PEST parser

I'm trying to write a PEG for an old file format that has about 100 keywords which can't be used as identifiers.

Here's an example of a keyword rule:

IN = { ^"in" } // Caret means case insensitivity

keyword = { IN } // plus others

The identifier rule looks like this:

identifier = @{ ( "_" | ASCII_ALPHA ) ~ ASCII_ALPHANUMERIC* }

Currently this identifier rule will match all the keywords. So the identifier rule becomes:

identifier = @{ !keyword ~ ( "_" | ASCII_ALPHA ) ~ ASCII_ALPHANUMERIC* }

This kind of works, except when an identifier begins with the same letters as a keyword. For example, the identifier inner is treated as the keyword in followed by text.

How to allow identifiers that begin with keywords? Note that in the PEST parser generator, terminals can only be specified as strings, not regex.

Upvotes: 4

Views: 480

Answers (1)

L. F.
L. F.

Reputation: 20559

You can force keyword to only match full words by using a predicate. For example:

identifier_start = _{ "_" | ASCII_ALPHA }
identifier_continue = _{ "_" | ASCII_ALPHANUMERIC }

keyword = @{ (^"for" | ^"in") ~ !identifier_continue }
identifier = @{ !keyword ~ identifier_start ~ identifier_continue* ~ !identifier_continue }

This will match for and in, but not form or int.

Upvotes: 4

Related Questions