I have a case where I want to iterate all but the last 2 elements of a collection. Let's say I do it a weird way like x.Reverse().Skip(2).Reverse() . Will each LINQ operation generate effectively a nested iterator or cause enumeration, etc or is it smarter than that? What happens under the hood in this sort of case? Clarification: this is just one example of chained LINQ statements you might see, where a developer favours short powerful code without thinking too hard about performance - maybe they are a Computer Science student and this seems the 'cleverest' solution. I am not asking how to solve this particular example

How efficient are chained LINQ statements?

Reputation: 30464

Whenever you want to know how a LINQ statement works, how efficient it is, it might be a good idea to look at the source code (google: reference source Enumerable reverse)

Here you'll find that as soon as you start enumerating your sequence (this is: use a non-deferred method = use a LINQ method that does not return IEnumerable, or use foreach), the first Reverse will enumerate your complete sequence once, but it in a buffer, and start iteratingbackwards from the last element.

Your skip(2) will only enumerate 2 elements.

The 2nd Reverse will make a new Buffer, containing these two elements and start iterating backwards: so in your original sequence forwards.

If you look what happens: the elements of your original sequence are put in a buffer, the last and the pre-last elements are put in a second buffer. This second buffer is iterated: pre-last then last element.

So every element it iterated once, the last two elements are iterated once again. If iterating is a lot of work, consider creating a List, then take the last two elements. This will iterate your elements only once.

If you have other LINQ statements of which you wonder how it is done, lool at the source code

Upvotes: 2

juharr

Reputation: 32266

First off yes that is creating a "iterator" and does not actually do any iterating until you materialized the query in a foreach or by calling ToList on it. When you do that the number of iterations that occur are based on the underlying type. Reverse will create a buffer array for whatever source you give it and iterate over it backwards. If the source is ICollection<T> then it will use its CopyTo method to populate the array which should usually result in a single bulk copy of contiguous data in constant time. If it's not a ICollection<T> then it will iterate the source into the buffer and then iterate over it backwards. With that in mind here's what happens for your specific query when you iterate.

First the last Reverse will start iterating its source (which is not an ICollection<T>).

Then the Skip will start iterating its source

Then the first Reverse will either do a CopyTo if its source is a ICollection<T> or it will iterate the source into a buffer array that it resizes as needed.

Then the first Reverse will iterate over its buffer array backwards

Then the Skip will take the results skipping the first two and yielding the rest

Then the last Reverse will take the result and add them to its buffer array and resize it as needed.

Finally the last Reverse will iterate the buffer array backwards.

So if you're dealing with a ICollecion<T> that's one CopyTo and then 1 iteration of all of the values and then 1 iteration of all but 2 of the values. If it's not a ICollection<T> that's basically 3 iterations of the values (really the last iteration is of all but 2). And either way it's also using two intermediate arrays in the process.

And to prove that the query does no iterations until you materialize it you can check out this example

void Main()
{
    var query = MyValues().Reverse().Skip(2).Reverse();
    Console.WriteLine($"After query before materialization");
    var results = query.ToList();
    Console.WriteLine(string.Join(",", results));
}

public IEnumerable<int> MyValues()
{
    for(int i = 0; i < 10; i ++)
    {
        Console.WriteLine($"yielding {i}");
        yield return i;
    }
}

Which produces the output

After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
0,1,2,3,4,5,6,7

When compared to the other example you had x.Take(x.Count() - 2), that will iterate the source before you materialize it once for the Count (unless it's ICollection or ICollection<T> in which case it will just use the Count property) then it will iterate it again when you materialize it.

Here's the same example with the different code and the resulting output.

void Main()
{
    var x = MyValues();
    var query = x.Take(x.Count() - 2);
    Console.WriteLine($"After query before materialization");
    var results = query.ToList();
    Console.WriteLine(string.Join(",", results));
}

public IEnumerable<int> MyValues()
{
    for(int i = 0; i < 10; i ++)
    {
        Console.WriteLine($"yielding {i}");
        yield return i;
    }
}

With the output

yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
0,1,2,3,4,5,6,7

So which one is better is completely dependent on the source. For a ICollection<T> or ICollection the Take and Count would be preferred (unless the source might change between when the query is created and when it is materialized), but if it's neither of those you might prefer the double Reverse to avoid iterating the source twice (especially if the source can change between when you create the query and actually materialize it as the size might change as well), but that has to be weighted against the increase in total iterations done and memory usage.

Upvotes: 2

How efficient are chained LINQ statements?

Answers (3)

Related Questions