Reputation: 61
I am new to creating compilers and interpreters and as an exercise, I created a handwritten lexer in Java that spits out tokens looking like the following.
public Token(TokenType type, String lexeme, Object literal, int line) {
this.type = type;
this.lexeme = lexeme;
this.literal = literal;
this.line = line;
}
Now I want to create a parser using ANTLR, sadly I am running into some issues when trying to link my lexer with the ANTLR-generated parser. I have tried to implement a TokenSource (this is an ANTLR interface see: https://www.antlr.org/api/Java/org/antlr/v4/runtime/TokenSource.html), this can be used by a CharStream that the parser can use.
My first question: Is this a good approach or are there better ways to link a custom lexer with an ANTLR-generated parser?
My second question: ANTLR token types are integers, so the interface wants me to implement a getType() that returns an int. My token types are in an enum (so they are integers) but how do I link these integers/types with the ones in the ANTLR parser grammar (so they both see the type as the same type)?
Upvotes: 0
Views: 160
Reputation: 53337
I see no reason why you cannot use a hand written lexer with a generated ANTLR4 parser. All what matters is that you fulfil the contract specified by the TokenSource interface. How you do that is totally up to you. Except for error recovery the parser never looks at anything but the token type (but needs token instances to build the parser tree).
A similar route is taken for the next layer in the stack: token stream (which takes a lexer and takes care to feed the parser with token instances). There's an implementation which caches all tokens the lexer returned and another one that only buffers a small part, which can be used for an endless stream of tokens.
Upvotes: 0