CJH
CJH

Reputation: 1594

Searching All Nodes in a Tree Structure efficiently

I am working with a tree type structure that consists of a single 'root'TreeNodeDefinition record which can contain amongst other things a list of other TreeNodeDefinition classes, each of these can then contain a further List and so on and so forth.

I basically want to be able to traverse ALL nodes within the tree structure and check if a condition is met per node then that definition record is added to a list. I have come up with a method to do this but I cant help but think there is a much more efficient way of doing it:

List<ITreeNodeDefinition> treeNodeDefinitions = new List<ITreeNodeDefinition>();

treeNodeDefinitions = treeFactory.SearchNodesByRelationshipId(treeNodeDefinition, relationshipId, new List<ITreeNodeDefinition>());

where the first parameter is my Root node definition record, the second parameter is what I need to compare each node by and lastly I am passing in an Empty list that I want to populate each time a node matches my check. The method is as follows:

    public List<ITreeNodeDefinition> SearchNodesByRelationshipId(ITreeNodeDefinition treeNodeDefinition, int? relationshipId, List<ITreeNodeDefinition> tndList)
    {
        if (treeNodeDefinition.RelationshipId == relationshipId)
        {
            tndList.Add(treeNodeDefinition);
        }

        if (treeNodeDefinition.Nodes.Count != 0)
        {
            foreach (ITreeNodeDefinition nodeDefinition in treeNodeDefinition.Nodes)
            {
                List<ITreeNodeDefinition> tempTable = this.SearchNodesByRelationshipId(nodeDefinition, relationshipId, tndList);
            }
        }
        return tndList;
    }

As you can see, the method is calling itself for each sub-node found in the treeNodeDefinition.Nodes list. This is returning to a tempTable that I never do anything with... This stinks of inefficiency to me. i was wondering if there is a more direct way of navigating this kind of structure... I sure I am just missing a trick.

Upvotes: 2

Views: 2458

Answers (2)

Maxim Kitsenko
Maxim Kitsenko

Reputation: 2333

Looking to the question/code i think that the problem relates not to one thing (like you said redundant tempTable variable) but it relates to several areas, I'll try highlight these areas/problems.

1) Theory. The first thing you need to know before iterating the tree - is that there are 2 ways of iterating trees 'breadth-first' and 'depth-first'. They can be implemented via 'recursion' and 'loop'. There are a lot of articles about it.

I suggest you to read few articles about these approaches, some of them:

2) Problem that you noticed not significant from the performance pov. Yes, you are right at when you say storing the result of the previous call into 'tempTable' is not very good. Returning the 'tempTable' has no big impact on performance or memory since 'tempTamble' is referenced to the same object as 'tndList'. Returning method parameter does not bring 'inefficiency'. The only one thing it impacts on is - not clean code and few bytes in the stack. Really you don't need to return anything in your method. Why do you return the list?

I suggest you to read about value and reference types. Some materials

I changed your you code a little bit, now it returns void:

public void SearchNodesByRelationshipId(ITreeNodeDefinition treeNodeDefinition, int? relationshipId, List<ITreeNodeDefinition> tndList)
{
    if (treeNodeDefinition.RelationshipId == relationshipId)
    {
        tndList.Add(treeNodeDefinition);
    }

    if (treeNodeDefinition.Nodes.Count != 0)
    {
        foreach (ITreeNodeDefinition nodeDefinition in treeNodeDefinition.Nodes)
        {
            this.SearchNodesByRelationshipId(nodeDefinition, relationshipId, tndList);
        }
    }
}

3) Another problem. Significant. The way you are doing iterating - is 'depth-first' recursion. This approach is not robust, potentially it leads to 'StackOverflowException' . Because of long chain of recursive method calls.

I suggest you to read about iteration and recursion algorithms in the context of trees, and implement iteration approach.

Just for information: There is another way to avoid 'stackOverfrwException' with 'recursion' approach - tail recursion, but afaik, there is no such mechanism in C#, but such a mechanism exists in F# and other functional languages.

Briefly how iterative approach works, in pseudocode:

put the root to the collection X (which is queue for '*breadth-first*' and stack for '*depth-first*')
Do while X is not empty
    var currentNode = get next node from X
    process current root (do checks that you need, aggregate data etc.)
    get child nodes of the currentNode, save them into X

Upvotes: 0

Parrish Husband
Parrish Husband

Reputation: 3178

You could approach this with an explicit stack and avoid recursion altogether:

public static IEnumerable<ITreeNodeDefinition> DepthFirstSearch(ITreeNodeDefinition root, int? relationshipId)
{
    var stack = new Stack<ITreeNodeDefinition>();
    stack.Push(root);
    while(stack.Count > 0)
    {
        var current = stack.Pop();
        if (current.RelationshipId == relationshipId)
            yield return current;

        foreach(var node in current.Nodes)
            stack.Push(node);
    }
}

To add onto this though, it might be simpler and more flexible to not hard-code in the relationship id check at all, and just filter the results down afterward if you find you're commonly trying to traverse the tree structure:

var matches = treeFactory.Traverse(root)
                         .Where(t => t.RelationshipId == 5)
                         .ToList();

Expanding on this, by using a search predicate you could build this search functionality into the method the same way LINQ does, which you could implement like this:

public static IEnumerable<ITreeNodeDefinition> DepthFirstSearch(ITreeNodeDefinition root, Func<ITreeNodeDefinition, bool> predicate)
{
    var stack = new Stack<ITreeNodeDefinition>();
    stack.Push(root);
    while (stack.Count > 0)
    {
        var current = stack.Pop();
        if (predicate(current))
            yield return current;

        foreach (var node in current.Nodes)
            stack.Push(node);
    }
}

The benefit is you aren't hard-coding in one specific search case when traversing. Calling this with the RelationshipId search would be:

var matches = treeFactory.DepthFirstSearch(root, t => t.RelationshipId == 5)
                         .ToList();

For completeness, here's an example of breadth-first searching. Note the difference in Queue<T> vs Stack<T> in regards to how the traversal order changes:

public static IEnumerable<ITreeNodeDefinition> BreadthFirstSearch(ITreeNodeDefinition root, Func<ITreeNodeDefinition, bool> predicate)
{
    var queue = new Queue<ITreeNodeDefinition>();
    queue.Enqueue(root);
    while (queue.Count > 0)
    {
        var current = queue.Dequeue();
        if (predicate(current))
            yield return current;

        foreach (var node in current.Nodes)
            queue.Enqueue(node);
    }
}

For your use case, there is likely no advantage to either one except for breadth-first being ordered in a way that is generally better when listing the nodes.

Upvotes: 2

Related Questions