Reputation: 3277
I have been searching the net for the past few hours, trying to learn a simple example of using ANTLR.But I am having a hard time understanding the examples. Does any body have simple example that would output this in Java:
if my input is
printf("Hello World");
the output should be :
Hello World
and if my input is
inx =1;
it should give an error message.
I'm trying to create a c++ compiler(starting with the lexical up until the semantic part only) using java,and I would really like to know what I should do.
Upvotes: 0
Views: 6000
Reputation: 4633
From ANTLR here is the trivial example of parsing (and evaluating) an expression.
grammar Expr;
@header {
package test;
import java.util.HashMap;
}
@lexer::header {package test;}
@members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}
prog: stat+ ;
stat: expr NEWLINE {System.out.println($expr.value);}
| ID '=' expr NEWLINE
{memory.put($ID.text, new Integer($expr.value));}
| NEWLINE
;
expr returns [int value]
: e=multExpr {$value = $e.value;}
( '+' e=multExpr {$value += $e.value;}
| '-' e=multExpr {$value -= $e.value;}
)*
;
multExpr returns [int value]
: e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
;
atom returns [int value]
: INT {$value = Integer.parseInt($INT.text);}
| ID
{
Integer v = (Integer)memory.get($ID.text);
if ( v!=null ) $value = v.intValue();
else System.err.println("undefined variable "+$ID.text);
}
| '(' e=expr ')' {$value = $e.value;}
;
ID : ('a'..'z'|'A'..'Z')+ ;
INT : '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS : (' '|'\t')+ {skip();} ;
But like I mentioned in my comments, C++ is very hard to parse correctly. There are many ambiguities and requires * amount of look ahead (which ANTLR does provide). So doing this in any efficient form is complicated. That is why I recommend implementing something like PL/0 which was designed for students to write their first compiler for. Tiny BASIC is also a good start. Both of these can be implemented without using a tool like ANTLR by doing recursive descent. I have implemented both in under 1000 lines together (in C++ and C# respectively).
ANTLR is a great tool though, especially once you get your head wrapped around recursive descent you might want to upgrade to a more powerful parser. I recommend both of Terrence Parr's books, ANTLR Reference and Language Implementation Patterns. The ANTLR book will tell you everything (plus some) that you want to know about ANTLR. The second book will teach you all about parsers and compilers, from recursive descent to black-magic backtracking.
More resources from a similar SO question can be found here. And if you're into Lisp or Scheme, you can check out JScheme, it is written in Java (less than 1000 lines I believe).
Upvotes: 2
Reputation: 3255
Here is a grammar that almost does what you want:
grammar PrintLang;
sentence
: statement
;
statement
: functionCall '(' argument ')' ';'
{
if ($functionCall.funName.equals("printf")) {
System.out.println($argument.arg);
}
}
;
functionCall returns [String funName]
: ID
{ $funName = $ID.text; }
;
argument returns [String arg]
: STRING
{ $arg = $STRING.text; }
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
I generated this in AntlrWorks. All of the token rules were generated for me.
Here is the java file to test it.
import org.antlr.runtime.*;
public class PrintIt {
public static void main(String args[]) {
String inputString = "printf(\"HelloWorld\");";
// Create an input character stream from standard in
ANTLRStringStream input = new ANTLRStringStream(inputString);
// Create an ExprLexer that feeds from that stream
PrintLangLexer lexer = new PrintLangLexer(input);
// Create a stream of tokens fed by the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// Create a parser that feeds off the token stream
PrintLangParser plParser = new PrintLangParser(tokens);
try {
plParser.sentence();
} catch (Exception e) {
e.printStackTrace();
}
}
}
You'll note that this java code is almost a verbatim copy/paste from the Antlr website example (I don't believe I even changed the comments, which is why the comment refers to Standard in, but the code actually uses a String). And here is the command line I used to do it.
bash$ java -cp ./antlr-3.4-complete.jar org.antlr.Tool PrintLang.g
bash$ javac -cp ./:./antlr-3.4-complete.jar PrintIt.java
bash$ java -cp antlr-3.4-complete.jar:. PrintIt
"HelloWorld"
Oops, I forgot that the string I wanted to print isn't the matched token ("HelloWorld", including the quotes), it's the string within the quotes.
Also, you'll note that I hardcoded the lookup of printf as a string comparison. In reality, you'll want an environment that contains the symbols accessible at a given scope (related, see antlr's "scope" construct. More difficult, though sometimes useful: create an environment that you pass to each parsing rule).
Most important: find Bart Kiers answers by searching SO for more antlr questions. He posts excellent examples.
Upvotes: 5