Reputation: 29632
I am about to write a parser for OpenEdge (a 4GL database language) and I would like to use ANTLR (or similar).
There are two reasons I think this may be a problem:
OpenEdge is a 4GL database language which allows constructs like:
assign
customer.name = 'Customer name'
customer.age = 20
.
Where the .
at the end is the line separator and this statement combines the assignment of two database fields. OpenEdge has many more of these constructs;
I need to preserve all details of the source files, so I cannot expand preprocessor statements before I can parse the file, so:
// file myinc.i
7 * 14
// source.p
assign customer.age = {myinc.i}.
In the above example, I need to preserve the fact that customer.age
was assigned using {myinc.i}
instead of 7 * 14
.
Can I use ANTLR to acchieve this or do I need to write my own parser?
UPDATE:
I need this parser not to generate an executable from it, but rather for code analysis. This is why I need the AST to contain the fact that the include was used.
Upvotes: 4
Views: 782
Reputation: 21
The solution lies within the OpenEdge architect itself. You should checkout the openedge architect jar files (C:\Progress\OpenEdge\oeide\eclipse\plugins\com.openedge.pdt.core_10.2.1.01\lib\progressparser.jar)
Here you will find the parser classes. They are linked all the way to Eclipse, but I did the separation from the eclipse framework, and it works. The progressparser uses antlr, and the antlr document can be found in the following folder... C:\Progress\OpenEdge\oeide\eclipse\plugins\com.openedge.pdt.core_10.2.1.01\oe_common_services.jar.
Inside that file you will find the antlr definition (check for openedge.g).
Good luck. If you want the separated eclipse environment just drop me a mail.
Upvotes: 2
Reputation: 95324
The issue with multiple assignments is easy enought to handle in a grammar. Just allow multiple assignements:
assign_stmt = 'assign' assignments '.' ;
assignements = ;
assignments = assignments target '=' expression ;
One method you can use is to augment the grammar to allow preprocessor token sequences wherever a nonterminal can be allowed, and simply not do preprocessor expansion. For your example, you have some grammar rule:
expression = ... ;
just add the rule:
expression = '{' include_reference '}' ;
This works to the extent that the preprocessor isn't used abusively to generate several lanaguage elements that span nonterminal boundaries.
What kind of code anlaysis do you intend to do? Pretty much to do anything, you'll need to name and type resolution, which will require to expand the preprocessor directives. In that case, you'll need a more sophisticated scheme, because you need the expanded tree to do the name resolution, and need the include information associated off to the side.
Our DMS Software Reengineering Toolkit has an OpenEdge parser, in which we present do the previous "keep the include file references" trick. DMS's C parser adds a "macro node" to the tree where the macro (OpenEdge "include" is just a funny way to write a macro definition) child nodes contains the tree as you expect it, and the reference information that refers back to the macro defintion. This requires some careful organization, and lots of special handliing of macro nodes where they occur.
Upvotes: 0
Reputation: 31564
Are you aware that there is already an open source parser for OpenEdge / Progress 4GL? It is called Proparse, written using ANTLR (originally it was hand-coded in OpenEdge itself, but eventually converted to ANTLR). It is written in Java, but I think you can run it in C# by using IKVM.
The license is the Eclipse license, so it is business-friendly.
Upvotes: 1
Reputation: 170138
Just to clarify: ANTLR isn't a parser, but a parser generator.
You either write your own parser for the language, or you write a (ANTLR) grammar for it, and let ANTLR generate the lexer and parser for you. You can mix custom code in your grammar to keep track of your assignments.
So, the answer is: yes, you can use ANTLR.
Note I am unfamiliar with OpenEdge, but SQL grammars are usually tough to write parser or grammars for. Have a look at the ANTLR wiki to see that it's no trivial task to write one from the ground up. You didn't mention it, but I assume you've looked at existing parsers that can parse your language?
FYI: you might already have it, but here's a link to the documentation including a BNF grammar for the OpenEdge SQL dialect: http://www.progress.com/progress/products/documentation/docs/dmsrf/dmsrf.pdf
Upvotes: 3
Reputation: 9714
You can do the same thing the C preprocessor is doing - extend your grammar with some sort of pragmas that set a source location, and let your preprocessor generate code stuffed with that pragmas.
Upvotes: 0