Reputation: 1101
I am using ANTLR 3 to do the below.
Assume I have an SQL query. I know that in general it's WHERE, ORDER BY and GROUP BY clauses are optional. In terms of ANTLR's grammar I would describe that like this:
query : select_clause from_clause where_clause? group_by_clause? order_by_clause?
The rule for each clause will obviously start with the respective keyword.
What I actually need is to extract each clause's contents as a string without dealing with its internal structure.
To do this I started with the following grammar:
query : select_clause from_clause where_clause? group_by_clause? order_by_clause? EOF; select_clause : SELECT_CLAUSE ; from_clause : FROM_CLAUSE ; where_clause : WHERE_CLAUSE ; group_by_clause : GROUP_BY_CLAUSE ; order_by_clause : ORDER_BY_CLAUSE ; SELECT_CLAUSE : 'select' ANY_CHAR*; FROM_CLAUSE : 'from' ANY_CHAR*; WHERE_CLAUSE : 'where' ANY_CHAR*; GROUP_BY_CLAUSE : 'group by' ANY_CHAR*; ORDER_BY_CLAUSE : 'order by' ANY_CHAR*; ANY_CHAR : .; WS : ' '+ {skip();};
This one didn't work. I have had further attempts composing a correct grammar with no success. I suspect this task is doable with ANTLR3 but I am just missing smth.
More generally, I would like to be able to collect chars from the input stream into a single token until meeting a specific keyword that would indicate the beginning of a new token. This keyword should be the part of the new token.
Can you help me please?
Upvotes: 4
Views: 1185
Reputation: 170158
Instead of adding them to your tokens, why not move the ANY_CHAR*
into parser rules instead? You could even "glue" these single tokens together using a rewrite rule.
A quick demo:
grammar T;
options { output=AST; }
tokens { QUERY; ANY; }
query : select_clause from_clause where_clause? group_by_clause? order_by_clause? EOF
-> ^(QUERY select_clause from_clause where_clause? group_by_clause? order_by_clause?)
;
select_clause : SELECT_CLAUSE^ any;
from_clause : FROM_CLAUSE^ any;
where_clause : WHERE_CLAUSE^ any;
group_by_clause : GROUP_BY_CLAUSE^ any;
order_by_clause : ORDER_BY_CLAUSE^ any;
any : ANY_CHAR* -> ANY[$text];
SELECT_CLAUSE : 'select';
FROM_CLAUSE : 'from';
WHERE_CLAUSE : 'where';
GROUP_BY_CLAUSE : 'group' S+ 'by';
ORDER_BY_CLAUSE : 'order' S+ 'by';
ANY_CHAR : . ;
WS : S+ {skip();};
fragment S : ' ' | '\t' | '\r' | '\n';
If you now parse the input:
select JUST ABOUT ANYTHING from YOUR BASEMENT order by WHATEVER
the following AST would be created:
Trying to do something similar in your lexer would be messy, and would mean some custom code (or predicates) to check for keywords up ahead in the char-stream (both not pretty!).
Upvotes: 2