Pingpong
Pingpong

Reputation: 8009

Linq ToList() execution

I want to build a tree structure like this:

 root  Id 1
   child id 2
     grandChild  id 3

Code sample below. If I use GetChildrenNodesCorrect(), I get the correct result. But when GetChildrenNodesWrong() is used, it returns below:

root  Id 1
   child id 2
     Null

I know that ToList() is not deferred execution, and returns result immediatelly. Could anyone explain this?

 public class ToListTest
    {
       public static void Entry()
       {
           var toListTest = new ToListTest();

           toListTest.Test();

       }

       public void Test()
       {
           List<Node> newsList = new List<Node>
               {
                   new Node{Id = 1, ParentId = 0},
                   new Node{Id = 2, ParentId = 1},
                   new Node{Id = 3, ParentId = 2}
               };

          var root = BuildUpTree(newsList);
       }


        private TreeNode BuildUpTree(List<Node> newsList)
        {
            var root = new TreeNode { currentNode = newsList.First(n => n.ParentId == 0) };

            BuildUpTreeChildrenNodes(newsList, root);

            return root;
        }



        private void BuildUpTreeChildrenNodes(List<Node> newsList, TreeNode currentTreeNode)
        {

            currentTreeNode.Children = GetChildrenNodesWrong(newsList, currentTreeNode);

            foreach (var node in currentTreeNode.Children)
            {                
                BuildUpTreeChildrenNodes(newsList, node);    
            }


        }

        private  IEnumerable<TreeNode> GetChildrenNodesWrong(List<Node> newsList, TreeNode cuurentNode)
        {
            return newsList.Where(n => n.ParentId == cuurentNode.currentNode.Id)
                                .Select(n => new TreeNode
                                {
                                    currentNode = n
                                });
        }

        private IEnumerable<TreeNode> GetChildrenNodesCorrect(List<Node> newsList, TreeNode cuurentNode)
        {
            return GetChildrenNodesWrong(newsList, cuurentNode).ToList();
        }

       public class TreeNode
       {

           public Node currentNode { get; set; }
           public IEnumerable<TreeNode> Children { get; set; }

       }

       public class Node
       {

           public int Id { get; set; }
           public int ParentId { get; set; }

       }
    }

Update

In debug, when using GetChildrenNodesWrong(), root has both child and grandchild before the method returns. After the method returns, root has only child, and grandchild is null.

Update 2

IMO, the problem might not be related to clean code. But anyone is welcome to show more intuitive code.

Upvotes: 3

Views: 509

Answers (2)

evanmcdonnal
evanmcdonnal

Reputation: 48076

I'm not entirely sure what you're asking. All LINQ queries have deferred execution, when you call ToList() you're simply forcing the query to execute. I think the main problem is in your where clause. Only two objects satisfy the condition so the IEnumerable returned by your LINQ query should only have 2 objects.

It's not doing what you expect because the LINQ query in GetChildrenNodesWrong is producing an "off by one" error. Here is basically what happens;

1) we feed it root for n = root nothing happens. We move to the next node.

2) n.Id = 1, the where condition is met by node 2 as it's parentId is 1. We allocate a new node, point current to node 2

3) We get to the third node now. n.ParentId = 2 and current.Id = 2. We have a match so we allocated another node and point current to node 3.

4) we're at the end of the list. Grand child is never allocated because we're off by one.

Basically you iterate x time where x is the length of the list but because current = n on the first iteration you don't allocate a node so you end up with x -1 nodes when you're expecting x.

Upvotes: 0

Ben Reich
Ben Reich

Reputation: 16324

Every time the IEnumerable is evaluated the Linq query is re-executed. So, when you're computing the tree, it is allocating space for nodes but not assigning them to any permanent variable. This means that in the foreach loop in BuildUpTreeChildrenNodes, you are not calling the recursive function on the instance of the node you want. Instead, you're calling it on a re-instantiated version of the node that has been created by the foreach loop (which enumerates the IEnumerable). When you call ToList on the IEnumerable instead, then the foreach loop will return the elements of the list, which is in memory.

If you make root public static, and then debug your code, you'll see that when you call BuildUpTreeChildrenNodes, the node argument is not the instance of the node that you want. Even though it has the same ID and represents the same node in the graph, it is not actually connected in any way to the root node. Check:

root.Children.Any(n => n.Id == node.Id) //true
root.Children.Contains(node) //false

The simplest way to see your problem is here:

//Create a singleton Node list:
var nodeSingleton= Enumerable.Range(0, 1).Select(x => new Node { Id = x });
Console.Write(nodeSingleton.Single() == nodeSingleton.Single());

You might expect this to return true, but in fact it will be false - both times the Single Linq method is called, the deferred execution of the singleton variable is re-evaluated, and returns a different instance of the Node class.

If you call ToList on the singleton, however, then you get the list in memory and the Single method will return the same instance of the Node.

More broadly, I think the problem with this code is that it mixes up imperative and functional code too much. It is strange that so many of the methods are void and then the GetChildrenNodesWrong method is not. I think you should pick a style and stick with it, since switching paradigms can be confusing.

Upvotes: 1

Related Questions