Reputation: 11
I want to extra all preprocess statement in C source file, and ignore all other statement. I'v tried add a last rule like Unknown : . -> skip ; // or -> channel(HIDDEN) ;
in the lexer, or in the parser, add a last rule like:ignored : . ;
, but it does not work.
Here is my grammar :
grammar PreProcessStatement;
pre_if_statement
: pre_if pre_elif* pre_else? pre_endif
;
pre_if : PreProcessBegin 'if' statement;
pre_endif : PreProcessBegin 'endif' ;
pre_else : PreProcessBegin 'else' ;
pre_elif : PreProcessBegin 'elif'statement ;
pre_define : PreProcessBegin 'define' statement;
pre_undef : PreProcessBegin 'undef'statement ;
pre_pragma : PreProcessBegin 'pragma'statement;
statement
: IDENTIFIER
| statement Condition statement
| '(' statement (Condition | Logic_or | Logic_and) statement ')'
| statement (Logic_or | Logic_and) statement
;
Logic_or
: '||'
;
Logic_and
: '&&'
;
PreProcessBegin : '#' ;
Condition : '==' | '>' | '>='| '<' | '<=' ;
NUM : INT | HEX ;
STRID : '"'ID'"' ;
IDENTIFIER : [a-zA-Z_0-9]+ ;
ID : [a-zA-Z_]+ ;
INT : [0-9]+ ;
HEX : '0x'INT;
WS : [ \t\n\r]+ -> skip ;
NewLine : ('\n' | '\r' | '\n\r');
MulLine : '\\' NewLine -> skip ;
Unknown : .*? -> skip ; // or -> channel(HIDDEN) ;
Input:
#if (test == ttt)
#elif rrrr
#else
aaa
#endif
Error:
line 4:0 extraneous input 'aaa' expecting '#'
I'v read the link below, does not work. Skipping unmatched input in Antlr
What's wrong with my grammar?
Upvotes: 1
Views: 892
Reputation: 1003
The aaa
input won't match with Unknown
token. It will match with IDENTIFIER : [a-zA-Z_0-9]+
token which is defined before Unknown
lexeme.
Put the Unknown
lexeme definition before others tokens. Add to this lexeme a semantic predicate which will check if the first character in the line is not a #
character. If it is true then skip the whole line until the NewLine
token.
Unknown : {getCharPositionInLine() == 0 && _input.LA(1) != '#'}? .*? NewLine -> skip;
When you spot a #
character enter a new lexer mode PREPROCESSOR
. This allows us from now on to use only tokens defined within the PREPROCESSOR
mode. Exit from this mode when a new line occurs. So when we are out of the mode we are looking for two tokens: PreProcessBegin
(line started with #
character) and Unknown
(line without a #
). Otherwise in PREPROCESSOR
mode we will match the statements like in any other, regular language.
Example of the lexer:
PreProcessBegin : '#' -> pushMode(PREPROCESSOR); // enter mode
Unknown : .*? NewLine -> skip; // or skip the line
mode PREPROCESSOR; // when in PREPROCESSOR mode use defined below tokens
(...)
Condition : '==' | '>' | '>='| '<' | '<=';
IDENTIFIER : [a-zA-Z_0-9]+ ;
ID : [a-zA-Z_]+ ;
INT : [0-9]+ ;
(...)
NewLine : ('\n' | '\r' | '\n\r') -> popMode; // exit mode
Upvotes: 1