usage of antlr for generating source code for another language

Question

Can be ANTLR used for parsing source code of one language and create source code for another language?

Because so far, looking at all those listeners I cannot see a way to segregate separate facilities - like different statement blocks and etc. - in order to create another language facilities

(program (programHeading program (identifier HelloWorld) ;) 
(block (procedureAndFunctionDeclarationPart (procedureOrFunctionDeclaration 
(procedureDeclaration procedure (identifier myprocedure) (formalParameterList ( 
(formalParameterSection (parameterGroup (identifierList (identifier x) , (identifier y)) : (typeIdentifier integer))) )) ; 
(block (compoundStatement begin (statements (statement (unlabelledStatement (simpleStatement (procedureStatement (identifier writeln) ( 
(parameterList (actualParameter (expression (simpleExpression (term (signedFactor (factor (variable (identifier x))))) + (term (signedFactor 
(factor (unsignedConstant (string ' : '))))) + (term (signedFactor (factor (variable (identifier y))))))))) ))))) ; 
(statement (unlabelledStatement (simpleStatement emptyStatement)))) end)))) ;) (compoundStatement begin (statements 
(statement (unlabelledStatement (simpleStatement (procedureStatement (identifier from))))) i := 1 to 10 do begin writeln ( i ) ;) end)) ; writeln ( 'Hello, World!' ) ; end .)

For example there I cannot see a way to define where one begin statement started and another ended end.

Well I can do something using stack helpers but I can parse file line by line myself in that case...

It is the parse result of this code

program HelloWorld;

procedure myprocedure(x, y: integer);
begin
    writeln(x + ' : ' + y);
end;

begin
    from i := 1 to 10 do
    begin
        writeln(i);
    end;
    writeln('Hello, World!');
end.

Maybe I am just looking from the wrong side or do not understand something?

Ira Baxter · Accepted Answer

Yes, ANTLR can be used to translate a language L1 to a langauge L2. But this is nowhere as easy as people make it sound.

Method1: Build a parser for L1. Walk over the tree for the L1 instance. At each node, spit out translated text for L2 using ANTLR's string templates.

You'll find it hard to decide exactly what to generate, because what you generate at each spot depends on the context (surrounding declarations, code, and how you decided to translate other language constructs). Consider translating:

a+b

What you want to generate is likely to be pretty different if "a" is a number, vs. if "a" is a string. You can't know that by looking at the "a" in that expression; you have to find and interpret the declaration for "a" (e.g, build and consult a symbol table).

You will find it hard to generate good code, because the (string template) output is raw text and you cannot interpret that text easily to optimize it. For instance, imagine you want to translate:

x = y + "abc"
x = x + "def"

A simple translator (statement by statement) might produce (as pure text):

set(x,concat(y,"abc"))
set(x,concat(x,"def"))

But an optimizing translator would ideally produce:

set(x,concat(y,"abcdef"))

which it can only do by inspecting the output and realizing that the output can be optimized. (OK, in this example you might optimize the input first, but langauge differences may prevent that).

This problem motivates...

Method2: Build a parser for L1. Build a parser for L2. Walk over the L1 tree. At each node, build an L2 tree using the ANTLR generated node constructors for L2. You'll have the same hard time deciding what to generate, for the same reasons, but now that you have the L2 tree, you can at least consider writing code to walk over the L2 tree and optimize it, e.g., implement L2-to-L2 tree transformations that achieve the above operation.

When done, prettyprint the L2 tree. You can use ANTLR's string templates for L2 nodes to help.

Other Problems with pure parser-based translators

Both approaches suffer from a generic problem that pure parser generators like ANTLR have: they have no (easy) way to collect context information needed to implement a good translator. [This will be true of any pure parser generator you pick].

You usually need other support, too, e.g, that prettyprinting machinery. Life is also more convenient if you can write down the transformations directly, e.g.,

 rule translate_square(t: term): product -> product
      "  	**2 " =>   " 	 * 	 "

rather than writing code to walk up and down the tree to figure the root is "power" operator and the right child is the constant 2. You'll need to write hundreds of bits of code to translate all the constructs in L1 if L1 is a nontrivial language, and doing this procedurally gets tiresome pretty fast.

So, you can use any parser generator to build a translator from one language to another. That's surely better than just building a translator without a parser generator, which you can also do (many real compilers are built this way). But, while the parser generator helps some, it hardly makes a dent in the problem. You need a lot more machinery, and should expect to spend a lot more effort than just getting the parser to work (many real compilers pay this price to achieve their result).

usage of antlr for generating source code for another language

Answers (1)

Related Questions