Leon Chen
Leon Chen

Reputation: 407

Can I use antlr to parse partial data?

I am trying to use antlr to parse a log file. Because I am only interested in partial part of the log, I want to only write a partial parser to process important part.

ex: I want to parse the segment:

[ 123 begin ]

So I wrote the grammar:

log :   
    '[' INT 'begin' ']'
    ;


INT : '0'..'9'+
    ;


NEWLINE
    : '\r'? '\n'
    ;

WS
    : (' '|'\t')+ {skip();}
    ;

But the segment may appear at the middle of a line, ex:

 111 [ 123 begin ] 222

According to the discussion: What is the wrong with the simple ANTLR grammar? I know why my grammar can't process above statement.

I want to know, is there any way to make antlr ignore any error, and continue to process remaining text?

Thanks for any advice! Leon

Upvotes: 7

Views: 1313

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170158

Since '[' might also be skipped in certain cases outside of [ 123 begin ], there's no way to handle this in the lexer. You'll have to create a parser rule that matches token(s) to be skipped (see the noise rule).

You'll also need to create a fall-through rule that matches any character if none of the other lexer rules matches (see the ANY rule).

A quick demo:

grammar T;

parse
    : ( log {System.out.println("log=" + $log.text);}
      | noise
      )*
      EOF
    ;

log : OBRACK INT BEGIN CBRACK
    ;

noise
    : ~OBRACK                  // any token except '['
    | OBRACK ~INT              // a '[' followed by any token except an INT
    | OBRACK INT ~BEGIN        // a '[', an INT and any token except an BEGIN
    | OBRACK INT BEGIN ~CBRACK // a '[', an INT, a BEGIN and any token except ']'
    ;

BEGIN   : 'begin';
OBRACK  : '[';
CBRACK  : ']';
INT     : '0'..'9'+;
NEWLINE : '\r'? '\n';
WS      : (' '|'\t')+ {skip();};
ANY     : .;

Upvotes: 7

Related Questions