Zoran Đukić
Zoran Đukić

Reputation: 777

ANTRL 3 grammar omitted part of input source code

I am using this ANTLR 3 grammar and ANTLRWorks for testing that grammar.

But I can't figure out why some parts of my input text are omitted.

I would like to rewrite this grammar and display every element (lparen, keywords, semicolon,..) of the source file (input) in AST / CST.

I've tried everything, but without success. Can someone who is experienced with ANTLR help me?

enter image description here

Parse tree:

parse tree

Upvotes: 1

Views: 117

Answers (1)

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51330

I've managed to narrow it down to the semic rule:

/*
This rule handles semicolons reported by the lexer and situations where the ECMA 3 specification states there should be semicolons automaticly inserted.
The auto semicolons are not actually inserted but this rule behaves as if they were.

In the following situations an ECMA 3 parser should auto insert absent but grammaticly required semicolons:
- the current token is a right brace
- the current token is the end of file (EOF) token
- there is at least one end of line (EOL) token between the current token and the previous token.

The RBRACE is handled by matching it but not consuming it.
The EOF needs no further handling because it is not consumed by default.
The EOL situation is handled by promoting the EOL or MultiLineComment with an EOL present from off channel to on channel
and thus making it parseable instead of handling it as white space. This promoting is done in the action promoteEOL.
*/
semic
@init
{
    // Mark current position so we can unconsume a RBRACE.
    int marker = input.mark();
    // Promote EOL if appropriate   
    promoteEOL(retval);
}
    : SEMIC
    | EOF
    | RBRACE { input.rewind(marker); }
    | EOL | MultiLineComment // (with EOL in it)
    ;

So, the EVIL semicolon insertion strikes again!

I'm not really sure, but I think these mark/rewind calls are getting out of sync. The @init block is executed when the rule is entered for branch selection and for actual matching. It's actually creating a lot of marks but not cleaning them up. But I don't know why it messes up the parse tree like that.

Anyway, here's a working version of the same rule:

semic
@init
{
    // Promote EOL if appropriate   
    promoteEOL(retval);
}
    : SEMIC
    | EOF
    | { int pos = input.index(); } RBRACE { input.seek(pos); }
    | EOL | MultiLineComment // (with EOL in it)
    ;

It's much simpler and doesn't use the mark/rewind mechanism.

But there's a catch: the semic rule in the parse tree will have a child node } in the case of a semicolon insertion before a closing brace. Try to remove the semicolon after i-- and see the result. You'll have to detect this and handle it in your code. semic should either contain a ; token, or contain EOL (which means a semicolon got silently inserted at this point).

Upvotes: 1

Related Questions