vikas mehta
vikas mehta

Reputation: 458

Write a code to generate the parse tree

This question was asked to me in an interview question:

Write a code to generate the parse tree like compilers do internally for any given expression. For example:

a+(b+c*(e/f)+d)*g 

Upvotes: 1

Views: 6025

Answers (4)

Yavar
Yavar

Reputation: 11933

Simple way out is to convert your expression into postfix notation (abcef/*++) & then refer to the answer to this question (http://stackoverflow.com/questions/423898/postfix-notation-to-expression-tree) for converting the postfix expression to a Tree.

This is what the interviewer expected :)

Upvotes: 3

Uri London
Uri London

Reputation: 10797

Start by defining the language. No one can implement a parser or a compiler to a language that isn't very well defined. You give an example: 'a+(b+c*(e/f)+d)*g', which should trigger the following questions:

  1. Is the language a single expression, or there may be multiple statements (separated by ';' maybe?
  2. what are the 'a', 'b', ... 'g' tokens? Is it variable? What is the syntax of variables? Is it a C-like variable, or is it a single alphanumeric character as your example may imply.
  3. There are 3 binary expression in your example. Is that all? Does the language also support '-'. Does your language support logical, and bit-wise operators?
  4. Does the language support number literals? integer only? double? Does the language support string literals? Do you quote string literals?
  5. Syntax for comments?
  6. Which operator has precedence? Does '*' operator has precedence over '+' as in the example? Does operands evaluated right to left or left to right?
  7. Any Pre-processing?

Once you are equipped with a good definition of the language syntax, start with implementing a tokenizer. A tokenizer gets a stream of characters and generates a list of tokens. In the example above, each character is a token, but in var*12 (var power 12) there are 3 tokens: 'var', '*' and '12'. If regular expression is permitted, it is possible you can do this part of the parsing with regular expressions.

Next, have a function that identify each token by type: is it an operator, is it a variable, a number literal, string literal, etc. Package all in a method called NextToken that returns a token and its type.

Finally, start parsing. In your sample above the root of the parsing tree will be a node with the '+' operator (which has precedence over the ''). The left child is a variable token 'a' and the right child is a tree with a root element the '' token. Work recursively.

Upvotes: 3

Lindydancer
Lindydancer

Reputation: 26104

Whenever you intend to write a parser, the main question to ask is if you want to do it manually, or to use a parser generator framework.

In this case, I would say that it's a good exercise to write it all yourself.

Start with a good representation for the tree itself. This will be be output of your algorithm. For example, this could be a collection of objects, where one object kind could represent a "label" like a, b, and c in your example. Others could represent numbers. You could then defined a representation of operators, for example + is a binary operator, which would have two subobjects, representing the left and right subexpression.

Next step is the actual parser, I would suggest a classical recursive decent parser. One text describing this, and provides a standard pseudo-code implementation is this text by Theodore Norvell

Upvotes: 1

duffymo
duffymo

Reputation: 308813

I'd start with a simple grammar, something like those used by ANTLR and JavaCC.

Upvotes: 0

Related Questions