ChocolateBear
ChocolateBear

Reputation: 415

Parsing an arithmetic expression and building a tree from it in Java

I needed some help with creating custom trees given an arithmetic expression. Say, for example, you input this arithmetic expression:

(5+2)*7

The result tree should look like:

    *
   / \
  +   7
 / \
5   2

I have some custom classes to represent the different types of nodes, i.e. PlusOp, LeafInt, etc. I don't need to evaluate the expression, just create the tree, so I can perform other functions on it later. Additionally, the negative operator '-' can only have one child, and to represent '5-2', you must input it as 5 + (-2).

Some validation on the expression would be required to ensure each type of operator has the correct the no. of arguments/children, each opening bracket is accompanied by a closing bracket.

Also, I should probably mention my friend has already written code which converts the input string into a stack of tokens, if that's going to be helpful for this.

I'd appreciate any help at all. Thanks :)

(I read that you can write a grammar and use antlr/JavaCC, etc. to create the parse tree, but I'm not familiar with these tools or with writing grammars, so if that's your solution, I'd be grateful if you could provide some helpful tutorials/links for them.)

Upvotes: 40

Views: 66336

Answers (4)

Bill K
Bill K

Reputation: 62769

Assuming this is some kind of homework and you want to do it yourself..

I did this once, you need a stack

So what you do for the example is:

    parse    what to do?                Stack looks like
      (      push it onto the stack     (
      5      push 5                     (, 5
      +      push +                     (, 5, +
      2      push 2                     (, 5, +, 2
      )      evaluate until (           7            
      *      push *                     7, *
      7      push 7                     +7, *, 7
      eof    evaluate until top         49

The symbols like "5" or "+" can just be stored as strings or simple objects, or you could store the + as a +() object without setting the values and set them when you are evaluating.

I assume this also requires an order of precedence, so I'll describe how that works.

in the case of: 5 + 2 * 7

you have to push 5 push + push 2 next op is higher precedence so you push it as well, then push 7. When you encounter either a ) or the end of file or an operator with lower or equal precedence you start calculating the stack to the previous ( or the beginning of the file.

Because your stack now contains 5 + 2 * 7, when you evaluate it you pop the 2 * 7 first and push the resulting *(2,7) node onto the stack, then once more you evaluate the top three things on the stack (5 + *node) so the tree comes out correct.

If it was ordered the other way: 5 * 2 + 7, you would push until you got to a stack with "5 * 2" then you would hit the lower precedence + which means evaluate what you've got now. You'd evaluate the 5 * 2 into a *node and push it, then you'd continue by pushing the + and 3 so you had *node + 7, at which point you'd evaluate that.

This means you have a "highest current precedence" variable that is storing a 1 when you push a +/-, a 2 when you push a * or / and a 3 for "^". This way you can just test the variable to see if your next operator's precedence is < = your current precedence.

if ")" is considered priority 4 you can treat it as other operators except that it removes the matching "(", a lower priority would not.

Upvotes: 56

knils
knils

Reputation: 115

the given expression (5+2)*7 we can take as infix

Infix  :     (5+2)*7
Prefix :     *+527

from the above we know the preorder and inorder taversal of tree ... and we can easily construct tree from this. Thanks,

Upvotes: 1

Ray Weidner
Ray Weidner

Reputation: 131

I wanted to respond to Bill K.'s answer, but I lack the reputation to add a comment there (that's really where this answer belongs). You can think of this as a addendum to Bill K.'s answer, because his was a little incomplete. The missing consideration is operator associativity; namely, how to parse expressions like:

49 / 7 / 7

Depending on whether division is left or right associative, the answer is:

49 / (7 / 7) => 49 / 1 => 49

or

(49 / 7) / 7 => 7 / 7 => 1

Typically, division and subtraction are considered to be left associative (i.e. case two, above), while exponentiation is right associative. Thus, when you run into a series of operators with equal precedence, you want to parse them in order if they are left associative or in reverse order if right associative. This just determines whether you are pushing or popping to the stack, so it doesn't overcomplicate the given algorithm, it just adds cases for when successive operators are of equal precedence (i.e. evaluate stack if left associative, push onto stack if right associative).

Upvotes: 13

Konstantin Komissarchik
Konstantin Komissarchik

Reputation: 29139

Several options for you:

  1. Re-use an existing expression parser. That would work if you are flexible on syntax and semantics. A good one that I recommend is the unified expression language built into Java (initially for use in JSP and JSF files).

  2. Write your own parser from scratch. There is a well-defined way to write a parser that takes into account operator precedence, etc. Describing exactly how that's done is outside the scope of this answer. If you go this route, find yourself a good book on compiler design. Language parsing theory is going to be covered in the first few chapters. Typically, expression parsing is one of the examples.

  3. Use JavaCC or ANTLR to generate lexer and parser. I prefer JavaCC, but to each their own. Just google "javacc samples" or "antlr samples". You will find plenty.

Between 2 and 3, I highly recommend 3 even if you have to learn new technology. There is a reason that parser generators have been created.

Also note that creating a parser that can handle malformed input (not just fail with parse exception) is significantly more complicated that writing a parser that only accepts valid input. You basically have to write a grammar that spells out the various common syntax errors.

Update: Here is an example of an expression language parser that I wrote using JavaCC. The syntax is loosely based on the unified expression language. It should give you a pretty good idea of what you are up against.

Contents of org.eclipse.sapphire/plugins/org.eclipse.sapphire.modeling/src/org/eclipse/sapphire/modeling/el/parser/internal/ExpressionLanguageParser.jj

Upvotes: 3

Related Questions