Baptiste Pernet
Baptiste Pernet

Reputation: 3384

How to parse "Leg 1: Jun 25" with ANTLR

I am starting with antlr4 and after following some tutorials, I started to make my own grammar. For now, I wanted to parse a simple input Leg 1: Jun 25.

fragment DIGIT
  : [0-9];
fragment MONTH
  : [A-Z][a-z][a-z];
DATE
  : MONTH ' ' DIGIT+;
LEG_NUMBER
  : DIGIT+;

leg
 : 'Leg ' LEG_NUMBER ': ' DATE;

But it's no success, I get the following error

line 1:0 mismatched input 'Leg 1' expecting 'Leg '

I don't understand even the output message... Here is the parse tree in IntelliJ ANTLR plugin

Parse tree

Upvotes: 0

Views: 43

Answers (1)

Mike Cargal
Mike Cargal

Reputation: 6785

The parse tree is showing you that the Lexer has recognized your input as three tokens: a DATE ("Leg 1"), your : (implicitly defined) token, and then another DATE ("Jun 25").

The first thing to understand is that the Lexer will first tokenize your input stream of characters into a stream of tokens. At this point in the processing, parser rules have absolutely no impact. Parser rules match against the stream of tokens (not your input stream of characters).

Since your DATE rule says "Upper case letter, lowercase letter, lowercase letter, space, one or more numbers", then "Leg 1" is a match, and is recognized as a DATE token. The Lexer doesn't know (or care) that your parser rule wants to start by matching "Leg ".

It's always a good idea to run your input through some tool that shows you the token stream so you can validate your Lexer rules. That can either be the grun alias with the -tokens option, or you should be able to view your token stream in the IntelliJ ANTLR plugin (with some experience you'll also recognize that the parse tree diagram is telling you that as well)

One way to fix this would be to tighten up the MONTH fragment:

fragment MONTH
    : (
        'Jan'
        | 'Feb'
        | 'Mar'
        | 'Apr'
        | 'May'
        | 'Jun'
        | 'Jul'
        | 'Aug'
        | 'Sep'
        | 'Oct'
        | 'Nov'
        | 'Dec'
    )
    ;

That will prevent "Leg 1" from matching. I'm not recommending that as a good path forward with a "real" grammar, but it does resolve this immediate issue as you start to work with ANTLR.

Upvotes: 1

Related Questions