Reputation: 458
This question was asked to me in an interview question:
Write a code to generate the parse tree like compilers do internally for any given expression. For example:
a+(b+c*(e/f)+d)*g
Upvotes: 1
Views: 6025
Reputation: 11933
Simple way out is to convert your expression into postfix notation (abcef/*++) & then refer to the answer to this question (http://stackoverflow.com/questions/423898/postfix-notation-to-expression-tree) for converting the postfix expression to a Tree.
This is what the interviewer expected :)
Upvotes: 3
Reputation: 10797
Start by defining the language. No one can implement a parser or a compiler to a language that isn't very well defined. You give an example: 'a+(b+c*(e/f)+d)*g', which should trigger the following questions:
Once you are equipped with a good definition of the language syntax, start with implementing a tokenizer. A tokenizer gets a stream of characters and generates a list of tokens. In the example above, each character is a token, but in var*12 (var power 12) there are 3 tokens: 'var', '*' and '12'. If regular expression is permitted, it is possible you can do this part of the parsing with regular expressions.
Next, have a function that identify each token by type: is it an operator, is it a variable, a number literal, string literal, etc. Package all in a method called NextToken that returns a token and its type.
Finally, start parsing. In your sample above the root of the parsing tree will be a node with the '+' operator (which has precedence over the ''). The left child is a variable token 'a' and the right child is a tree with a root element the '' token. Work recursively.
Upvotes: 3
Reputation: 26104
Whenever you intend to write a parser, the main question to ask is if you want to do it manually, or to use a parser generator framework.
In this case, I would say that it's a good exercise to write it all yourself.
Start with a good representation for the tree itself. This will be be output of your algorithm. For example, this could be a collection of objects, where one object kind could represent a "label" like a
, b
, and c
in your example. Others could represent numbers. You could then defined a representation of operators, for example +
is a binary operator, which would have two subobjects, representing the left and right subexpression.
Next step is the actual parser, I would suggest a classical recursive decent parser. One text describing this, and provides a standard pseudo-code implementation is this text by Theodore Norvell
Upvotes: 1
Reputation: 308813
I'd start with a simple grammar, something like those used by ANTLR and JavaCC.
Upvotes: 0