Boyolame
Boyolame

Reputation: 339

Parsing simple template with Antlr4

I'm trying to parse expressions like he did [something called :action], where action is a variable and the brackets imply that the block is optional. If one of the variables inside the brackets is missing, then I need to replace the whole block with a placeholder like nothing.

I think the logic part is simple because I'm familiar with visitor mechanism but I couldn't parse the strings.

I tried the following parser but it generates error node instead of optionalParameter. I couldn't find the problem, can anyone take a look at this parse script and tell me what I'm doing wrong?

grammar NamedParam;

query: (QUERY_CONTENT optionalParameter)*;

optionalParameter: '[' (STRING namedParameter)* ']';

namedParameter: ':' IDENTIFIER;

IDENTIFIER
    : (ALPHANUMERIC)+;

fragment ALPHANUMERIC
    : [A-Za-z0-9];

STRING : ~(':' | ']')* ;
QUERY_CONTENT : ~('[')* ;

Upvotes: 0

Views: 549

Answers (1)

CoronA
CoronA

Reputation: 8075

Your understanding of ANTLR parsing seems to be incomplete:

ANTLR parsing is strictly preceeded by ANTLR lexing. In the lexing phase the complete text is tokenized without knowing the parser rules. The rule how to generate token is:

  • prefer the longest token
  • in case of two matches with same length prefer the first defined token

You have three token types (I assume that there is an additional whitespace rule):

he did [something called (-> STRING)
: (-> ':')
action] (-> QUERY_CONTENT)

What you want: The parser should control which token rule should be applied.

he did (->QUERY_CONTENT) 
...

but this fails because there exists a longer token match he did [something called.

Avoid tokens that subsume other tokens

  • Adding a (non-alphanumeric) character (even a whitespace) that is not : or ] to and IDENTIFIER makes the resulting token to a STRING.
  • Adding a character that is not [ to a STRING makes the resulting token to a QUERY_CONTENT

Sometimes it cannot be avoided, but it causes a permanent risk of hard understandable parsing errors.

How to resolve this:

  • rewrite your grammar to fit the ANTLR concept (this is probably very hard to achieve, if you want to keep this syntax)
  • refine your language syntax (more limiter symbols, non-subsuming tokens)
  • use a PEG-parser (parboiled, rats). These types of parsers come very close to your understanding.

Upvotes: 0

Related Questions