fmsf
fmsf

Reputation: 37137

How can I parse code to build a compiler in Java?

I need to write a compiler. It's homework at the univ. The teacher told us that we can use any API we want to do the parsing of the code, as long as it is a good one. That way we can focus more on the JVM we will generate.

So yes, I'll write a compiler in Java to generate Java.

Do you know any good API for this? Should I use regex? I normally write my own parsers by hand, though it is not advisable in this scenario.

Any help would be appreciated.

Upvotes: 6

Views: 11272

Answers (12)

Markus Jarderot
Markus Jarderot

Reputation: 89171

Regex is good to use in a compiler, but only for recognizing tokens (i.e. no recursive structures).

The classic way of writing a compiler is having a lexical analyzer for recognizing tokens, a syntax analyzer for recognizing structure, a semantic analyzer for recognizing meaning, an intermediate code generator, an optimizer, and last a target code generator. Any of those steps can be merged, or skipped entirely, if makes the compiler easier to write.

There have been many tools developed to help with this process. For Java, you can look at

Upvotes: 12

nes1983
nes1983

Reputation: 15756

Use a parser combinator, like JParsec. There's a good video tutorial on how to use it.

Upvotes: 0

gimel
gimel

Reputation: 86362

Go classic - Lex + Yacc. In Java it spells JAX and javacc. Javacc even has some Java grammars ready for inspection.

Upvotes: 6

Dr. Hans-Peter Störr
Dr. Hans-Peter Störr

Reputation: 25976

http://java-source.net/open-source/parser-generators and http://catalog.compilertools.net/java.html contain catalogs of tools for this. Compare also the Stackoverflow question Alternatives to Regular Expressions.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533492

I suggest you look at at the source for BeanShell. It has a compiler for Java and is fairly simple to read.

Upvotes: 1

Apocalisp
Apocalisp

Reputation: 35054

I'd recommend using either a metacompiler like ANTLR, or a simple parser combinator library. Functional Java has a parser combinator API. There's also JParsec. Both of these are based on the Parsec library for Haskell.

Upvotes: 5

Michael Myers
Michael Myers

Reputation: 191885

JFlex is a scanner generator which, according to the manual, is designed to work with the parser generator CUP.

One of the main design goals of JFlex was to make interfacing with the free Java parser generator CUP as easy as possibly [sic].

It also has support for BYACC/J, which, as its name suggests, is a port of Berkeley YACC to generate Java code.

I have used JFlex itself and liked it. Howeveer, the project I was doing was simple enough that I wrote the parser by hand, so I don't know how good either CUP or BYACC/J is.

Upvotes: 3

stepancheg
stepancheg

Reputation: 4276

Parser combinators is a good choice. Popular Java implementation is JParsec.

Upvotes: 2

Jonas Kölker
Jonas Kölker

Reputation: 7837

I've used SableCC in my compiler course, though not by choice.

I remember finding it very bulky and heavyweight, with more emphasis on cleanliness than convenience (no operator precedence or anything; you have to state that in the grammar).

I'd probably want to use something else if I had the choice. My experiences with yacc (for C) and happy (for Haskell) have both been pleasant.

Upvotes: 2

snemarch
snemarch

Reputation: 5008

If you're going to go hardcore, throw in a bit of http://llvm.org in the mix :)

Upvotes: 1

Vineet Reynolds
Vineet Reynolds

Reputation: 76709

I would recommend ANTLR, primarily because of its output generation capabilities via StringTemplate.

What is better is that Terence Parr's book on the same is by far one of the better books oriented towards writing compilers with a parser generator.

Then you have ANTLRWorks which enables you to study and debug your grammar on the fly.

To top it all, the ANTLR wiki + documentation, (although not comprehensive enough to my liking), is a good place to start off for any beginner. It helped me refresh knowledge on compiler writing in a week.

Upvotes: 10

tddmonkey
tddmonkey

Reputation: 21184

Have a look at JavaCC, a language parser for Java. It's very easy to use and get the hang of

Upvotes: 9

Related Questions