ldl
ldl

Reputation: 156

Parsing SQL CREATE TABLE statement using ANTLR4

Lexer file code is as follows:

lexer grammar CreateLexer;

CREATE
   : 'create' | 'CREATE'
   ;

NUMBER_OF_SHARDS:'number_of_shards' | 'NUMBER_OF_SHARDS';


NUMBER_OF_REPLICAS:'number_of_replicas' | 'NUMBER_OF_REPLICAS';


ID
  : ( 'a' .. 'z' | 'A' .. 'Z' | '_' | '\u4e00' .. '\u9fa5' | '-')+
  ;


INT
  : [0-9]+
  ;


NEWLINE
  : '\r'? '\n' -> skip
  ;


WS
  : [\t\r\n]+ -> skip
  ;


INDEX
  : 'index' | 'INDEX'
  ;

TABLE:'table';

and parser file code is also as follows:

parser grammar CreateParser;

options
   { tokenVocab = CreateLexer; }
stat
   : create_clause
   ;

create_clause
   : CREATE INDEX index_name shards? replicas?
   ;

index_name
   : (ID)*(INT)*
   ;

shards
   : NUMBER_OF_SHARDS INT
   ;

replicas
   : NUMBER_OF_REPLICAS INT
   ;

and this is my testing code demonstrates how I use modules above:

String sql = "create index A number_of_shards 1 number_of_replicas 1";
CreateLexer createLexer = new CreateLexer(new ANTLRInputStream(sql));
createLexer.removeErrorListeners();

CreateParser parser = new CreateParser(new CommonTokenStream(createLexer));
ParseTree tree = parser.stat();
System.out.println(tree.toStringTree(parser));

when I run the test code above, I got an error:

line 1:7 missing INDEX at 'index'
(stat (create_clause create <missing INDEX> (index_name index A) (shards number_of_shards 1) (replicas number_of_replicas 1)))

After I replaced 'INDEX' with 'TABLE' at 'create_clause' in paser file, and replaced 'index' with 'table' in the test code as:

test code:

String sql = "create table A number_of_shards 1 number_of_replicas 1";

paser file:

create_clause
   : CREATE TABLE index_name shards? replicas?
   ;

and I run it again, it still got the same error:

line 1:7 missing 'table' at 'table'
(stat (create_clause create <missing 'table'> (index_name table A) (shards number_of_shards 1) (replicas number_of_replicas 1)))

However, after I deleted the key word TABLE in parser file as follows:

create_clause
   : CREATE index_name shards? replicas?
   ;

weird thing happens, I got no error:

(stat (create_clause create (index_name table A) (shards number_of_shards 1) (replicas number_of_replicas 1)))

Can anyone tell me why SQL Statement like 'CREATE TABLE' can not be parsed? Do I miss anything? Thanks in advance!

Upvotes: 0

Views: 937

Answers (1)

GRosenberg
GRosenberg

Reputation: 6001

Antlr generally matches lexer rules based first on text match length, then on order in the grammar. So, your INDEX and TABLE rules will never be matched. Instead, the text is rendered in ID tokens.

By removing the requirement for an explicit INDEX token, you removed the cause of the error.

As a general rule, always dump the token stream so that you can see what the lexer is actually doing.

Upvotes: 1

Related Questions