Kfozli
Kfozli

Reputation: 31

Traversal on a tree with duplicated subtrees

I have a simple recursive post-order traversal on a tree that prints all the possible paths. The problem is that it takes a lot of time.

I would like to use the property of my tree in order to save the traversal time. the property is that I have duplicated subtree, in the example below, the subtree with the head 1 appear in the tree 3 times.

          10
  /        |        \
 5         15        1
 /\        / \      / \
2  1      1   6    3   8
  / \    / \
 3   8  3   8

I looking for improvement for my traversal that skips the sub trees that already been passed. The idea is to store every subtree that i've passed, but I couldn't fit it to the post order algorithm.

Thanks for help.

Upvotes: 3

Views: 572

Answers (4)

Milan Patel
Milan Patel

Reputation: 422

Assuming every node represents a unique subtree rooted at that node and set of node values is not much big, you could use this to optimise traversal:

string s[MAX_NODE_VALUE + 1];

string post_traverse(Node root)
{
    if(root == NULL)
        return "";
    if(s[root.value].length() == 0)
    {
        s[root.value] += post_traverse(root.left);
        s[root.value] += post_traverse(root.right);
        s[root.value] += itoa(root.value);
        s[root.value] += ' ';
    }
    cout << s[root.value];
    return s[root.value];
}

This will only help when you have too many repeated subtrees and set of node values is very small, otherwise it will consume too much space and take more time.

Upvotes: 0

gen-y-s
gen-y-s

Reputation: 881

Use a hash set to store which nodes have been visited, and for each node you visit, check if it's already been visited: if not, add it to the visited set, and proceed as usual, otherwise return.

Upvotes: 1

Petr
Petr

Reputation: 9997

that prints all the possible paths

I think this is the main problem. The running time of your program is linear in the amount of data to output, roughly for each step in tree your program does it should output at least one symbol. So you have no room for speedup here as long as you keep the required output the same (i.e. as long as you require all paths to be output), you just can not have an algorithm that is faster than O(R), where R is the total size of output you need to produce.

In fact, most probably the main bottleneck is output itself (console or disk performance is usually much worse than just walking in memory), so I think that if you profile your program, you will find that 90% of time is spent in output. Therefore not only you can not get a better asymptotically solution, you can not get a faster solution at all. (Except that if you optimize the output itself, but that's a different question.)

However, if you do not need to print all paths, but process them in some way — for example, count how much of them exist, or find the longest path, etc — then your solution can greately benefit from the fact that you have repeating subtrees. Actually, this means that you have not a tree, but a directed acyclic graph, and this usually allows for a rather simple dynamic programming approaches.

Upvotes: 0

Shindou
Shindou

Reputation: 506

I don't know if your program allows this change....Since we can not know if a subtree has appeared before when traversing, one possible way is to reorganize your tree as following to share the repeated subtree.

          10
  /        |        \
 5         15        1
 /\        / \      / \
2  1 ------   6    3   8
  / \     
 3   8     

Upvotes: 0

Related Questions