Reputation: 521409
I am trying to build a parser using Antlr4 for SQL statements. I don't really care which particular grammar of SQL I use, as I plan to enforce that only ANSI SQL is allowed, but in the example below I happen to be using the grammar for T-SQL. Here is my simple code:
String sql = "SELECT ROW_NUMBER() OVER (ORDER BY id) FROM some_table";
TSqlLexer tSqlLexer = new TSqlLexer(CharStreams.fromString(sql));
CommonTokenStream stream = new CommonTokenStream(tSqlLexer);
TSqlParser parser = new TSqlParser(stream);
ParseTree tree = parser.tsql_file(); // errors happen here
ParseTreeWalker walker = new ParseTreeWalker();
// I built a custom listener, so far not much in it
AnalyticFunctionBaseListener listener = new AnalyticFunctionBaseListener();
walker.walk(listener, tree);
The code only gets as far as the call to tsql_file()
before generating the following errors/warnings:
line 1:35 token recognition error at: 'i'
line 1:36 token recognition error at: 'd'
line 1:44 token recognition error at: 's'
line 1:45 token recognition error at: 'o'
line 1:46 token recognition error at: 'm'
line 1:47 token recognition error at: 'e'
line 1:49 token recognition error at: 't'
line 1:50 token recognition error at: 'a'
line 1:51 token recognition error at: 'b'
line 1:52 token recognition error at: 'l'
line 1:53 token recognition error at: 'e'
line 1:37 no viable alternative at input 'SELECTROW_NUMBER()OVER(ORDERBY)'
There is clearly something major I am missing here, but I don't what that is. I built using the published grammars for TSQL available at the ANTLR GitHub site.
Can any Antlr guru modify the above snippet so that it works? I am hoping someone can give a canonical example of how to use Antlr to parse a basic SQL statement.
Upvotes: 4
Views: 2782
Reputation: 170158
Note the following comment in the README:
Usage, important note
As SQL grammar are normally not case sensitive but this grammar implementation is, you must use a custom character stream that converts all characters to uppercase before sending them to the lexer.
You could find more information here with implementations for various target languages.
In short, change your code:
String sql = "SELECT ROW_NUMBER() OVER (ORDER BY id) FROM some_table";
TSqlLexer tSqlLexer = new TSqlLexer(CharStreams.fromString(sql));
to:
String sql = "SELECT ROW_NUMBER() OVER (ORDER BY id) FROM some_table";
CharStream s = CharStreams.fromString(sql);
TSqlLexer tSqlLexer = new TSqlLexer(new CaseChangingCharStream(s, true));
Find the source of CaseChangingCharStream
here: https://github.com/antlr/antlr4/blob/master/doc/resources/CaseChangingCharStream.java
In the comments, Mike suggests:
Alternatively you can use the MySQL grammar, which supports case-insensitive keywords without an extra stream
which might be a better option. I'm not saying the T-SQL grammar isn't good/accurate, but the fact that Mike's suggested grammar comes from the official MySQL repo (and Mike contributed to it), would give me confidence in the quality of it.
Upvotes: 3