ANTLR4 how do change hidden characters based on rule?

Question

I am trying to parse input files that have a bad structure because usually a newline is skipped, yet in some cases it is used to terminate a statement, so, sometimes it must be matched. Yet in this case, a newline seems to become a normal token in general and cannot be skipped.

To illustrate my problem consider the following grammar:

text
    : (line '
')+
    ;

line
    : ( ID )+
    | '(' ID* ')'
    ;

ID  : [a-zA-Z]+
    ;

WS  : [ 	

]+ -> skip
    ;

In this grammar I would like to parse statements like the following:

a b
c d
(e
f)

Yet I get the following error:

line 3:2 extraneous input '
' expecting {')', ID}

because the newline inside the brackets is not skipped. The grammar itself is much more complicated, hence it is not possible to simply put "' '?" everywhere where it is needed.

What is the best way to deal with this issue?

CoronA · Accepted Answer

For both of my suggestions you need to set whitespace to a HIDDEN channel (rather than skipping it).

To have flexible control on whitespace (or newlines) you could apply following solution Allow Whitespace sections ANTLR4. You may enable/disable whitespace at each point in the grammar.

An alternative would be to set to the hidden channel, but not to include it in the rule as token but as semantic predicate.

text
  : (line {/*check that the last whitespace contained a newline*/}?)+
  ;

For implementation you could use BufferedTokenStream#getHiddenTokensToRight or BufferedTokenStream#getHiddenTokensToLeft (both allow to read hidden off channel tokens).

ANTLR4 how do change hidden characters based on rule?

Answers (1)

Related Questions