Angel Todorov
Angel Todorov

Reputation: 97

ANTLR with non-greedy rules

I would like to have the following grammar (part of it):

expression 
: 
expression 'AND' expression
| expression 'OR' expression
| StringSequence
;

StringSequence
: 
StringCharacters
;

fragment
StringCharacters
: StringCharacter+
;

fragment
StringCharacter
: ~["\]
| EscapeSequence
;

It should match things like "a b c d f" (without the quotes), as well as things like "a AND b AND c".

The problem is that my rule StringSequence is greedy, and consumes the OR/AND as well. I've tried different approaches but couldn't get my grammar to work in the correct way. Is this possible with ANTLR4? Note that I don't want to put quotes around every string. Putting quotes works fine because the rule becomes non greedy, i.e.:

StringSequence
: '"' StringCharacters? '"'
;

Upvotes: 0

Views: 1288

Answers (2)

Mike Lischke
Mike Lischke

Reputation: 53407

You have no whitespace rule so StringCharacter matches everything except quote and backslash chars (+ the escape sequenc). Include a whitespace rule to make it match individual AND/OR tokens. Additionally, I recommend to define lexer rules for string literals ('AND', 'OR') instead of embedding them in the (parser) rule(s). This way you not only get speaking names for the tokens (instead of auto generated ones) but you also can better control the match order.

Upvotes: 2

CoronA
CoronA

Reputation: 8085

Yet a naive solution:

StringSequence : 
  (StringCharacter | NotAnd | NotOr)+
;
fragment NotAnd :
  'AN' ~'D'
| 'A' ~'N'
;
fragment NotOr:
  'O' ~('R')
;
fragment StringCharacter :
  ~('O'|'A')
;

Gets a bit more complex with Whitespace rules. Another solution would be with semantic predicates looking ahead and preventing the read of keywords.

Upvotes: 1

Related Questions