Reputation: 970
Is it possible to create an AST for any arbitrary programming language or IR using C or C++ alone (without the help of tools like YACC and LEX )?
If so, how to implement the lexical and syntactic analysis ?
If not, what are the tools that have to augmented to C or C++ to successfully create an AST ?
Hope I made my doubt clear. If My question looks vague or out of context, please indicate the required.
P.S : I Am actually trying to create the AST for the LLVM's .ll format of IR representation. I do know that .ll is derived from AST. But I am trying out static analysis practices. So I am looking at creating the AST.
Upvotes: 3
Views: 2117
Reputation: 95306
I doubt there's a one-to-one mapping between your arbitrary langauge and LLVM's ASTs. That means it is likely that you really want to do this in two stages:
Parse your 'arbitrary language' using the best parsing tools you can get to simplify the problem of parsing your language. Use that to build an AST for your language, following standard methods for parser generators producing ASTs. LEX/YACC are OK but there are plenty of good alternatives out there. Its pretty likely you'll need to build a symbol table.
Walk the AST of the your parsed langauge to build your LLVM AST. This won't be one-to-one, but the ability to look around the tree near a tree node in your AST to collect information need to generate the LLVM code will likely be extremely helpful.
This is a classic style for a simple compiler.
I suggest you read the Aho/Ullman Dragon book on syntax directed translation. A day's worth of education will save you months of wasted engineering time.
Upvotes: 0
Reputation: 58547
The most straight-forward methodology for creating the parser without a parser-generator is recursive descent. It is very well documented - the standard book in the field is The Dragon Book.
A scanner, that takes text as input and produces a string of tokens as output, can be written using standard string manipulation techniques.
Upvotes: 2