Dennie de Lange
Dennie de Lange

Reputation: 2934

Antlr generated files

I'm trying to understand the generated files from the antlr. I have two input .g4 (TSqlParser.g4 and TSqlLexer.g4) files.

When running antlr(4.7.2) using:

java -cp .;antlr.jar org.antlr.v4.Tool -Dlanguage=CSharp *.g4

it generates the following files:

/
│   TSqlLexer.cs
│   TSqlLexer.interp
│   TSqlLexer.tokens
│   TSqlParser.cs
│   TSqlParser.interp
│   TSqlParser.tokens
│   TSqlParserBaseListener.cs
│   TSqlParserListener.cs

What are the *.interp and the *.tokens files? Are these helper files? I couldn't find any documentation about them. If they are helper files, why aren't they cleaned up automatically?

Upvotes: 3

Views: 3401

Answers (1)

Mike Lischke
Mike Lischke

Reputation: 53542

The .interp and .tokens files serve specific purposes and are usually not of interest for a grammar author.

  • .tokens file: contains a list of token names and their numeric assignment as generated by ANTLR4. These are created for lexers only. When you add a tokenVocab option to your parser grammar (applies only to split grammars) then ANTLR4 will actually use this tokens file, not the lexer grammar. This means the lexer must be generated first to have the tokens file ready for parser generation. Btw. you can also have a tokenVocab setting in the lexer, to import token assignments from other sources. This can for instance be used to specify explicit token values, independent of how the lexer rules appear in a grammar. This is a great help if you want to ensure that certain tokens have very specific token types (i.e. to put all keywords in a continous range to allow quick checks for them). I use this approach in the parser of MySQL Workbench.
  • .interp: this is a relatively new addition to ANTLR4 and contains data which allows to run the built-in interpreter, instead of the generated parser. This is mostly of use for IDEs that allow to debug a grammar, like my ANTLR4 extension for vscode. The file contains exactly the same information as the generated parser/lexer file (token/rule names, their display names, the serialized ATN, mode + channel names in case of lexers).

Upvotes: 6

Related Questions