Reputation: 23054
Is there ways to tell gocc
to ignore things in lexical parser? E.g., for
2022-01-18 11:33:21.9885 [21] These are strings that I need to egnore, until - MYKW - Start Active One: 1/18/2022 11:33:21 AM
I want to tell gocc
to ignore from [21]
all the way to until
. Here is what I've been trying:
/* Lexical part */
_letter : 'A'-'Z' | 'a'-'z' | '_' ;
_digit : '0'-'9' ;
_timestamp1 : _digit | ' ' | ':' | '-' | '.' ;
_timestamp2 : _digit | ' ' | ':' | '/' | 'A' | 'P' | 'M' ;
_ignore : '[' { . } ' ' '-' ' ' 'M' 'Y' 'K' 'W' ' ' '-' ' ' ;
_lineend : [ '\r' ] '\n' ;
timestamp : _timestamp1 { _timestamp1 } _ignore ;
taskLogStart : 'S' 't' 'a' 'r' 't' ' ' ;
jobName : { . } _timestamp2 { _timestamp2 } _lineend ;
/* Syntax part */
Log
: timestamp taskLogStart jobName ;
However, the parser failed at:
error: expected timestamp; got: unknown/invalid token "2022-01-18 11:33:21.9885 [21] T"
The reason I think it should be working is that, the following ignore rule works perfectly fine for white spaces:
!lineComment : '/' '/' { . } '\n' ;
!blockComment : '/' '*' { . | '*' } '*' '/' ;
and I'm just applying the above rule into my normal text parsing.
Upvotes: 1
Views: 68
Reputation: 23054
It doesn't work that way --
The EBNF looks very much like regular expressions but it does not work like regular expression at all -- what I mean is,
The line,
2022-01-18 11:33:21.9885 [21] These are strings that I need to ignore, until - MYKW - Start Active One: 1/18/2022 11:33:21 AM
If to match with regular expression, it can simply be:
([0-9.: -]+).*? - MYKW - Start ([^:]+):.*$
However, that cannot be directly translate into EBNF definition just like that, because the regular expression relies on the context in between each elements to ping point a match (e.g., the .*?
matching rule is a local rule that only works based on the context it is in), however, gocc
is a LR parser, which is a context-free grammar!!!
Basically a context-free grammar means, each time it is trying to do a .*
match to all existing lexical symbols (i.e., each lexical symbol can be considered a global rule that is not affected by the context it is in). I cannot quite describe it but there is no previous context (or the symbol following it) involved in next match. That's the reason why the OP fails.
For a real sample of how the '{.}
' can be used, see
How to describe this event log in formal BNF?
Upvotes: 0