Linq ToList() execution

Question

I want to build a tree structure like this:

 root  Id 1
   child id 2
     grandChild  id 3

Code sample below. If I use GetChildrenNodesCorrect(), I get the correct result. But when GetChildrenNodesWrong() is used, it returns below:

root  Id 1
   child id 2
     Null

I know that ToList() is not deferred execution, and returns result immediatelly. Could anyone explain this?

 public class ToListTest
    {
       public static void Entry()
       {
           var toListTest = new ToListTest();

           toListTest.Test();

       }

       public void Test()
       {
           List newsList = new List
               {
                   new Node{Id = 1, ParentId = 0},
                   new Node{Id = 2, ParentId = 1},
                   new Node{Id = 3, ParentId = 2}
               };

          var root = BuildUpTree(newsList);
       }


        private TreeNode BuildUpTree(List newsList)
        {
            var root = new TreeNode { currentNode = newsList.First(n => n.ParentId == 0) };

            BuildUpTreeChildrenNodes(newsList, root);

            return root;
        }



        private void BuildUpTreeChildrenNodes(List newsList, TreeNode currentTreeNode)
        {

            currentTreeNode.Children = GetChildrenNodesWrong(newsList, currentTreeNode);

            foreach (var node in currentTreeNode.Children)
            {                
                BuildUpTreeChildrenNodes(newsList, node);    
            }


        }

        private  IEnumerable GetChildrenNodesWrong(List newsList, TreeNode cuurentNode)
        {
            return newsList.Where(n => n.ParentId == cuurentNode.currentNode.Id)
                                .Select(n => new TreeNode
                                {
                                    currentNode = n
                                });
        }

        private IEnumerable GetChildrenNodesCorrect(List newsList, TreeNode cuurentNode)
        {
            return GetChildrenNodesWrong(newsList, cuurentNode).ToList();
        }

       public class TreeNode
       {

           public Node currentNode { get; set; }
           public IEnumerable Children { get; set; }

       }

       public class Node
       {

           public int Id { get; set; }
           public int ParentId { get; set; }

       }
    }

Update

In debug, when using GetChildrenNodesWrong(), root has both child and grandchild before the method returns. After the method returns, root has only child, and grandchild is null.

Update 2

IMO, the problem might not be related to clean code. But anyone is welcome to show more intuitive code.

Ben Reich · Accepted Answer

Every time the IEnumerable is evaluated the Linq query is re-executed. So, when you're computing the tree, it is allocating space for nodes but not assigning them to any permanent variable. This means that in the foreach loop in BuildUpTreeChildrenNodes, you are not calling the recursive function on the instance of the node you want. Instead, you're calling it on a re-instantiated version of the node that has been created by the foreach loop (which enumerates the IEnumerable). When you call ToList on the IEnumerable instead, then the foreach loop will return the elements of the list, which is in memory.

If you make root public static, and then debug your code, you'll see that when you call BuildUpTreeChildrenNodes, the node argument is not the instance of the node that you want. Even though it has the same ID and represents the same node in the graph, it is not actually connected in any way to the root node. Check:

root.Children.Any(n => n.Id == node.Id) //true
root.Children.Contains(node) //false

The simplest way to see your problem is here:

//Create a singleton Node list:
var nodeSingleton= Enumerable.Range(0, 1).Select(x => new Node { Id = x });
Console.Write(nodeSingleton.Single() == nodeSingleton.Single());

You might expect this to return true, but in fact it will be false - both times the Single Linq method is called, the deferred execution of the singleton variable is re-evaluated, and returns a different instance of the Node class.

If you call ToList on the singleton, however, then you get the list in memory and the Single method will return the same instance of the Node.

More broadly, I think the problem with this code is that it mixes up imperative and functional code too much. It is strange that so many of the methods are void and then the GetChildrenNodesWrong method is not. I think you should pick a style and stick with it, since switching paradigms can be confusing.

Linq ToList() execution

Answers (2)

Related Questions