Pieter van Ginkel
Pieter van Ginkel

Reputation: 29632

Can I use ANTLR for not pre-processed code?

I am about to write a parser for OpenEdge (a 4GL database language) and I would like to use ANTLR (or similar).

There are two reasons I think this may be a problem:

  1. OpenEdge is a 4GL database language which allows constructs like:

    assign
        customer.name = 'Customer name'
        customer.age = 20
    .
    

    Where the . at the end is the line separator and this statement combines the assignment of two database fields. OpenEdge has many more of these constructs;

  2. I need to preserve all details of the source files, so I cannot expand preprocessor statements before I can parse the file, so:

    // file myinc.i
    7 * 14
    
    // source.p
    assign customer.age = {myinc.i}.
    

    In the above example, I need to preserve the fact that customer.age was assigned using {myinc.i} instead of 7 * 14.

Can I use ANTLR to acchieve this or do I need to write my own parser?

UPDATE:
I need this parser not to generate an executable from it, but rather for code analysis. This is why I need the AST to contain the fact that the include was used.

Upvotes: 4

Views: 782

Answers (5)

Hercules 888
Hercules 888

Reputation: 21

The solution lies within the OpenEdge architect itself. You should checkout the openedge architect jar files (C:\Progress\OpenEdge\oeide\eclipse\plugins\com.openedge.pdt.core_10.2.1.01\lib\progressparser.jar)

Here you will find the parser classes. They are linked all the way to Eclipse, but I did the separation from the eclipse framework, and it works. The progressparser uses antlr, and the antlr document can be found in the following folder... C:\Progress\OpenEdge\oeide\eclipse\plugins\com.openedge.pdt.core_10.2.1.01\oe_common_services.jar.

Inside that file you will find the antlr definition (check for openedge.g).

Good luck. If you want the separated eclipse environment just drop me a mail.

Upvotes: 2

Ira Baxter
Ira Baxter

Reputation: 95324

The issue with multiple assignments is easy enought to handle in a grammar. Just allow multiple assignements:

assign_stmt = 'assign' assignments '.' ;
assignements = ;
assignments = assignments target '=' expression ;

One method you can use is to augment the grammar to allow preprocessor token sequences wherever a nonterminal can be allowed, and simply not do preprocessor expansion. For your example, you have some grammar rule:

expression = ... ;

just add the rule:

expression = '{'  include_reference '}' ;

This works to the extent that the preprocessor isn't used abusively to generate several lanaguage elements that span nonterminal boundaries.

What kind of code anlaysis do you intend to do? Pretty much to do anything, you'll need to name and type resolution, which will require to expand the preprocessor directives. In that case, you'll need a more sophisticated scheme, because you need the expanded tree to do the name resolution, and need the include information associated off to the side.

Our DMS Software Reengineering Toolkit has an OpenEdge parser, in which we present do the previous "keep the include file references" trick. DMS's C parser adds a "macro node" to the tree where the macro (OpenEdge "include" is just a funny way to write a macro definition) child nodes contains the tree as you expect it, and the reference information that refers back to the macro defintion. This requires some careful organization, and lots of special handliing of macro nodes where they occur.

Upvotes: 0

Abe Voelker
Abe Voelker

Reputation: 31564

Are you aware that there is already an open source parser for OpenEdge / Progress 4GL? It is called Proparse, written using ANTLR (originally it was hand-coded in OpenEdge itself, but eventually converted to ANTLR). It is written in Java, but I think you can run it in C# by using IKVM.

The license is the Eclipse license, so it is business-friendly.

Upvotes: 1

Bart Kiers
Bart Kiers

Reputation: 170138

Just to clarify: ANTLR isn't a parser, but a parser generator.

You either write your own parser for the language, or you write a (ANTLR) grammar for it, and let ANTLR generate the lexer and parser for you. You can mix custom code in your grammar to keep track of your assignments.

So, the answer is: yes, you can use ANTLR.

Note I am unfamiliar with OpenEdge, but SQL grammars are usually tough to write parser or grammars for. Have a look at the ANTLR wiki to see that it's no trivial task to write one from the ground up. You didn't mention it, but I assume you've looked at existing parsers that can parse your language?

FYI: you might already have it, but here's a link to the documentation including a BNF grammar for the OpenEdge SQL dialect: http://www.progress.com/progress/products/documentation/docs/dmsrf/dmsrf.pdf

Upvotes: 3

SK-logic
SK-logic

Reputation: 9714

You can do the same thing the C preprocessor is doing - extend your grammar with some sort of pragmas that set a source location, and let your preprocessor generate code stuffed with that pragmas.

Upvotes: 0

Related Questions