Reputation: 63728
I have a case where I want to iterate all but the last 2 elements of a collection.
Let's say I do it a weird way like x.Reverse().Skip(2).Reverse()
.
Will each LINQ operation generate effectively a nested iterator or cause enumeration, etc or is it smarter than that? What happens under the hood in this sort of case?
Clarification: this is just one example of chained LINQ statements you might see, where a developer favours short powerful code without thinking too hard about performance - maybe they are a Computer Science student and this seems the 'cleverest' solution. I am not asking how to solve this particular example
Upvotes: 4
Views: 902
Reputation: 30464
Whenever you want to know how a LINQ statement works, how efficient it is, it might be a good idea to look at the source code (google: reference source Enumerable reverse)
Here you'll find that as soon as you start enumerating your sequence (this is: use a non-deferred method = use a LINQ method that does not return IEnumerable, or use foreach), the first Reverse will enumerate your complete sequence once, but it in a buffer, and start iteratingbackwards from the last element.
Your skip(2) will only enumerate 2 elements.
The 2nd Reverse will make a new Buffer, containing these two elements and start iterating backwards: so in your original sequence forwards.
If you look what happens: the elements of your original sequence are put in a buffer, the last and the pre-last elements are put in a second buffer. This second buffer is iterated: pre-last then last element.
So every element it iterated once, the last two elements are iterated once again. If iterating is a lot of work, consider creating a List, then take the last two elements. This will iterate your elements only once.
If you have other LINQ statements of which you wonder how it is done, lool at the source code
Upvotes: 2
Reputation: 32266
First off yes that is creating a "iterator" and does not actually do any iterating until you materialized the query in a foreach
or by calling ToList
on it. When you do that the number of iterations that occur are based on the underlying type. Reverse
will create a buffer array for whatever source you give it and iterate over it backwards. If the source is ICollection<T>
then it will use its CopyTo
method to populate the array which should usually result in a single bulk copy of contiguous data in constant time. If it's not a ICollection<T>
then it will iterate the source into the buffer and then iterate over it backwards. With that in mind here's what happens for your specific query when you iterate.
First the last Reverse
will start iterating its source (which is not an ICollection<T>
).
Then the Skip
will start iterating its source
Then the first Reverse will either do a CopyTo
if its source is a ICollection<T>
or it will iterate the source into a buffer array that it resizes as needed.
Then the first Reverse will iterate over its buffer array backwards
Then the Skip will take the results skipping the first two and yielding the rest
Then the last Reverse will take the result and add them to its buffer array and resize it as needed.
Finally the last Reverse will iterate the buffer array backwards.
So if you're dealing with a ICollecion<T>
that's one CopyTo
and then 1 iteration of all of the values and then 1 iteration of all but 2 of the values. If it's not a ICollection<T>
that's basically 3 iterations of the values (really the last iteration is of all but 2). And either way it's also using two intermediate arrays in the process.
And to prove that the query does no iterations until you materialize it you can check out this example
void Main()
{
var query = MyValues().Reverse().Skip(2).Reverse();
Console.WriteLine($"After query before materialization");
var results = query.ToList();
Console.WriteLine(string.Join(",", results));
}
public IEnumerable<int> MyValues()
{
for(int i = 0; i < 10; i ++)
{
Console.WriteLine($"yielding {i}");
yield return i;
}
}
Which produces the output
After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
0,1,2,3,4,5,6,7
When compared to the other example you had x.Take(x.Count() - 2)
, that will iterate the source before you materialize it once for the Count
(unless it's ICollection
or ICollection<T>
in which case it will just use the Count
property) then it will iterate it again when you materialize it.
Here's the same example with the different code and the resulting output.
void Main()
{
var x = MyValues();
var query = x.Take(x.Count() - 2);
Console.WriteLine($"After query before materialization");
var results = query.ToList();
Console.WriteLine(string.Join(",", results));
}
public IEnumerable<int> MyValues()
{
for(int i = 0; i < 10; i ++)
{
Console.WriteLine($"yielding {i}");
yield return i;
}
}
With the output
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
0,1,2,3,4,5,6,7
So which one is better is completely dependent on the source. For a ICollection<T>
or ICollection
the Take
and Count
would be preferred (unless the source might change between when the query is created and when it is materialized), but if it's neither of those you might prefer the double Reverse
to avoid iterating the source twice (especially if the source can change between when you create the query and actually materialize it as the size might change as well), but that has to be weighted against the increase in total iterations done and memory usage.
Upvotes: 2
Reputation: 3319
Most LINQ operation do not produce separate nested iteration. Though Count()
has to iterate thru complete sequence.
As for the content of your question, please refer to: How to take all but the last element in a sequence using LINQ?
Upvotes: 1