Devs love ZenUML
Devs love ZenUML

Reputation: 11842

How to remove ambiguity from this syntax (antlr4)

I am writing a tool to generation sequence diagram from some text. I need to support this two syntax:

  1. anInstance:AClass.DoSomething() and
  2. participant A -> participant B: Any character except for \r\n (<>{}?)etc..

Let's call the fist one strict syntax and the second one free syntax. In anInstance:AClass.DoSomething(), I need it to be matched by to ( ID ':' ID ) as in the strict syntax. However, :AClass.DoSomething() will be first matched by CONTENT. I am thinking some kind of lookahead, checking if -> is there but not able to figure it out.

Strict syntax

message
 : to '.' signature
 ;
signature
 : methodName '()'
 ;
to
 : ID ':' ID
 ;
methodName
 : ID
 ;

ID
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;

Free syntax

asyncMessage
 : source '->' target content
 ;
source
 : ID+
 ;
target
 : ID+
 ;
content
 : CONTENT
 ;

ID
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;
CONTENT
 : ':' ~[\r\n]+
 ;
SPACE
 : [ \t\r\n] -> channel(HIDDEN)
 ;

Upvotes: 0

Views: 80

Answers (1)

Jiri Tousek
Jiri Tousek

Reputation: 12440

You need to understand how ANTLR lexer works:

  • It uses whichever rule matches the longest part of the input (starting at current position)
  • In case multiple rules can match the same input (i.e. same length), the first one (in order they're defined in) is used

With your current lexer rules, CONTENT takes precedence whenever you encounter an : so ':' ID will never be matched.

With ANTLR 4, you should probably use modes in this case - when you encounter the : in the free form, switch to a "free" mode and define a lexer rule CONTENT to be only available in the "free" mode.

See this question for an idea about how ANTLR 4 lexer modes work.

Upvotes: 1

Related Questions