Reputation: 3017

How do I optimize this postfix expression tree for speed?

Thanks to the help I received in this post:

I have a nice, concise recursive function to traverse a tree in postfix order:

deque <char*> d;
void Node::postfix()
{
    if (left != __nullptr) { left->postfix(); } 
    if (right != __nullptr) { right->postfix(); } 
    d.push_front(cargo);
    return;
};

This is an expression tree. The branch nodes are operators randomly selected from an array, and the leaf nodes are values or the variable 'x', also randomly selected from an array.

char *values[10]={"1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","x"};
char *ops[4]={"+","-","*","/"};

As this will be called billions of times during a run of the genetic algorithm of which it is a part, I'd like to optimize it for speed. I have a number of questions on this topic which I will ask in separate postings.

The first is: how can I get access to each 'cargo' as it is found. That is: instead of pushing 'cargo' onto a deque, and then processing the deque to get the value, I'd like to start processing it right away.

Edit: This question suggests that processing the deque afterwards is a better way.

I don't yet know about parallel processing in c++, but this would ideally be done concurrently on two different processors.

In python, I'd make the function a generator and access succeeding 'cargo's using .next().

See the above Edit.

But I'm using c++ to speed up the python implementation. I'm thinking that this kind of tree has been around for a long time, and somebody has probably optimized it already. Any Ideas? Thanks

Upvotes: 4

Answers (3)

Ira Baxter

Reputation: 95334

Of course, you'd want first to measure the cost overhead before you bother with optmization here, as your genetic algorithm next-generation production and mutations may swamp the evaluation time.

Once you've determined you want to optimize... the obvious answer is to compile the expression ("as much as possible"). Fortunately, there's lots of ways to "compile".

If you are implementing this in Python, you may be able to ask Python (i'm not an expert) to compile a constructed abstract syntax tree into a function, and that might be a lot faster, especially if CPython supports this.

It appears that you are implementing this in C++, however. In that case, I wouldn't evaluate the expression tree as you have defined it, as it means lots of tree walking, indirect function calls, etc. which are pretty expensive.

One cheesy trick is to spit out the actual expression as a text string with appropriate C++ function body text around it into a file, and launch a C++ compiler on that. You can automate all the spit-compile-relink with enough script magic, so that if you do this rarely, this would work and you would get expression evaluation as fast as the machine can do it.

Under the assumption you don't want to do that, I'd be tempted to walk your expression tree before you start the evaluation process, and "compile" that tree into a set of actions stored in a linear array called "code". The actions would be defined by an enum:

enum actions {
   // general actions first
   pushx,  // action to push x on a stack
   push1,
   push2,  // action to push 2 on a stack
   ...
   pushN,
   add,
   sub,  
   mul,    // action multiply top two stack elements together
   div,
   ...
   // optimized actions
   add1,
   sub1,
   mul1,
   div1,   // action to divide top stack element by 1
   ...
   addN,
   subN,
   ...
   addx,
   subX,
   ...
 }

In this case, I've defined the actions to implement a push-down stack expression evaluator, because that's easy to understand. Fortunately your expression vocabulary is pretty limited, so your actions can also be pretty limited (they'd be more complex if you had arbitrary variables or constants).

The expression ((x*2.0)+x)-1 would be executed by the series of actions

 pushx
 mul2
 addx
 sub1

Its probably hard to get a lot better than this.

One might instead define the actions to implement a register-oriented expression evaluator following the model of a multi-register CPU; that would enable even faster execution (I'd guess by a factor of two, but only if the expression got really complex).

What you want are actions that cover the most general computation you need to do (so you can always choose a valid actions sequence regardless of your original expression) and actions that occur frequently in the expressions you encounter (add1 is pretty typical in machine code, don't know what your statistics are like, and your remark that you are doing genetic programming suggests you don't know what the statistics will be, but you can measure them somehow or make and educated guess).

Your inner loop for evaluation would then look like (sloppy syntax here):

float stack[max_depth];
stack_depth=0;
for (i=1;i<expression_length;i++)
{
   switch (code[i])  // one case for each opcode in the enum
   {
    case pushx: stack[stack_depth++]=x;
               break;
    case push1: stack[stack_depth++]=1;
               break;
           ...
    case add:   stack[stack_depth-1]+=stack[stack_depth];
               stack_depth--;
               break;
           ...
    case subx:  stack[stack_depth]-=x;
               break;
           ...
    }
}
// stack[1] contains the answer here

The above code implements a very fast "threaded interpreter" for a pushdown stack expression evaluator.

Now "all" you need to do is to generate the content of the code array. You can do that by using your original expression tree, executing your original recursive expression tree walk, but instead of doing the expression evaluation, write the action that your current expression evaluator would do into the code array, and spitting out special case actions when you find them (this amounts to "peephole optimization"). This is classic compilation from trees, and you can find out a lot more about how to do this in pretty much any compiler book.

Yes, this is all a fair bit of work. But then, you decided to run a genetic algorithm, which is computationally pretty expensive.

Upvotes: 3

Mark B

Reputation: 96241

Assuming that processing a cargo is expensive enough that locking a mutex is relatively cheap, you can use a separate thread to access the queue as you put items on it.

Thread 1 would execute your current logic, but it would lock the queue's mutex before adding an item and unlock it afterwards.

Then thread 2 would just loop forever, checking the size of the queue. If it's not empty, then lock the queue, pull off all available cargo and process it. Repeat loop. If no cargo available sleep for a short period of time and repeat.

If the locking is too expensive you can build up a queue of queues: First you put say 100 items into a cargo queue, and then put that queue into a locked queue (like the first example). Then start on a new "local" queue and continue.

Upvotes: 1

Goz

Reputation: 62323

Lots of good suggestions in this thread for speeding up tree iteration:

Tree iterator, can you optimize this any further?

As for the problem I guess you could process Cargo in a different thread but seeing as you aren't actually doing THAT much. You would probably end up spending more time in thread synchronisation mechanisms that doing any actual work.

You may find instead of just pushing it into the deque that if you just process as you go along you may have things run faster. Yuo may find processing it all in a seperate loop at the end is faster. Best way to find out is try both methods with a variety of different inputs and time them.

Upvotes: 2

How do I optimize this postfix expression tree for speed?

Answers (3)

Related Questions