Reputation: 512
Using yield
keyword we are able to only calculate the needed amount of items in the IEnumerable
. I have set up a test, in which I was generating a number of items and then I wanted to generate one more. What I expected was the function to start where it ended, in this case on the 10000000
iteration and only iterate once more to 10000001
. Please check out this piece of code:
public static void Main()
{
var naturalNumbers = GetNaturalNumbers();
var executionTime = GetExecutionTime(() => naturalNumbers.ElementAt(10000000));
Console.WriteLine($"Time elapsed: {executionTime}");
executionTime = GetExecutionTime(() => naturalNumbers.ElementAt(10000001));
Console.WriteLine($"Time elapsed: {executionTime}");
}
public static IEnumerable<int> GetNaturalNumbers()
{
Console.WriteLine("Running GetNaturalNumbers() from the beginning");
for(int value = 0;; value++)
{
yield return value;
}
}
public static System.TimeSpan GetExecutionTime(Action action)
{
var stopwatch = Stopwatch.StartNew();
action();
stopwatch.Stop();
return stopwatch.Elapsed;
}
Output:
Running GetNaturalNumbers() from the beginning
Time elapsed: 00:00:00.0618548
Running GetNaturalNumbers() from the beginning
Time elapsed: 00:00:00.0562454
The time elapsed in the second call is lower and made me guess that it's only because processor optimization. The function is called twice from the beginning.
What's the real reason behind this output? Is it possible to continue the iteration from the point at which we ended in the first call?
Edit:
Now I see that my example isn't good enough. Suppose we have a very complex and time consuming operation in the GetNaturalNumbers()
function. I was expecting for the second call to start the enumeration from the 10000000
'th element and not need to calculate all of it once again. I expected a significant performance improvement in the second call.
Another example:
Imagine a book with many pages, in which you don't want to get all of the pages at once, but only the amount that the reader needs. When the reader turns the page we would get it without having to calculate all the previous ones.
Upvotes: 3
Views: 927
Reputation: 1614
You can create your own ElementAt extension method instead. It's an improvement over @DavidG's answer.
static class IEnumerableExtension
{
public static T ElementAtRetainPos<T>(this IEnumerable<T> source, EnumeratorWithPos<T> enumeratorWithPos, int elementToFind)
{
if (elementToFind < enumeratorWithPos.EnumeratorPosition)
{
enumeratorWithPos.Enumerator = source.GetEnumerator();
}
for (var i = enumeratorWithPos.EnumeratorPosition; i < elementToFind; i++)
{
enumeratorWithPos.Enumerator.MoveNext();
}
enumeratorWithPos.EnumeratorPosition = elementToFind;
return enumeratorWithPos.Enumerator.Current;
}
public static EnumeratorWithPos<T> GetEnumeratorWithPos<T>(this IEnumerable<T> source)
{
return new EnumeratorWithPos<T>()
{
Enumerator = source.GetEnumerator()
};
}
}
public class EnumeratorWithPos<T> : IDisposable
{
private IEnumerator<T> enumerator;
public IEnumerator<T> Enumerator
{
get
{
return enumerator;
}
set
{
enumerator = value;
EnumeratorPosition = -1;
}
}
public int EnumeratorPosition { get; set; }
public void Dispose()
{
Enumerator.Dispose();
}
}
How to use:
public static void Main()
{
var naturalNumbers = GetNaturalNumbers();
using (var enumeratorWithPos = naturalNumbers.GetEnumeratorWithPos())
{
var executionTime = GetExecutionTime(() => naturalNumbers.ElementAtRetainPos(enumeratorWithPos, 10000000));
Console.WriteLine($"Time elapsed: {executionTime}");
executionTime = GetExecutionTime(() => naturalNumbers.ElementAtRetainPos(enumeratorWithPos, 10000001));
Console.WriteLine($"Time elapsed: {executionTime}");
// force reset of enumerator by moving back to the previous
executionTime = GetExecutionTime(() => naturalNumbers.ElementAtRetainPos(enumeratorWithPos, 10000001));
Console.WriteLine($"Time elapsed: {executionTime}");
}
}
public static IEnumerable<int> GetNaturalNumbers()
{
Console.WriteLine("Running GetNaturalNumbers() from the beginning");
for(int value = 0;; value++)
{
yield return value;
}
}
public static System.TimeSpan GetExecutionTime(Action action)
{
var stopwatch = Stopwatch.StartNew();
action();
stopwatch.Stop();
return stopwatch.Elapsed;
}
First time it takes 00:00:00.1565686
Second request takes 00:00:00.0001373
Upvotes: 0
Reputation: 18013
Each ElementAt
calls are independent ones. All IEnumerable
extension methods are implemented so that they obtain a new enumerator.
If you use ReSharper you might also get a warning saying
Possible multiple enumerations of IEnumerable
naturalNumbers
On the other hand, the ElementAt
implementation really tries to do its best to retrieve the result quickly if possible. If the collection implements the IList<T>
interface, then it will call its indexer in the first place. Otherwise, it uses the ultimate fallback mechanism by obtaining the enumerator.
What's the real reason behind this output?
Just ignore the first result, which also includes JIT time. To me the result seems to be ok.
Is it possible to continue the iteration from the point at which we ended in the first call?
Obtain the enumerator manually as per @DavidG's answer.
Upvotes: 1
Reputation: 118937
You are trying to not iterate over the same enumerable twice. To do this, you need to get a bit more manual with your handling of the values. Calling GetNaturalNumbers
twice will give you two distinct enumerators, meaning your second operation just loops over all 10000001 items. You could do something like this which gets the IEnumerator<T>
for the enumerable and manually moves over the incoming stream of numbers:
var enumerable = GetNaturalNumbers();
var enumerator = enumerable.GetEnumerator();
var fistElementToFind = 10000000;
for(var i = 0; i <= fistElementToFind; i++)
{
enumerator.MoveNext();
}
var firstValue = enumerator.Current;
enumerator.MoveNext();
var nextValue = enumerator.Current;
Console.WriteLine($"First value: {firstValue}");
Console.WriteLine($"Next value: {nextValue}");
Upvotes: 2
Reputation: 5836
Everyone else seems to be talking about the speed, which doesn't seem to be your real question.
You are thinking that after you execute:
var executionTime = GetExecutionTime(() => naturalNumbers.ElementAt(10000000));
And then execute:
executionTime = GetExecutionTime(() => naturalNumbers.ElementAt(10000001));
That it is supposed to know you already did it for 10000000
so it should only do 1 iteration of the loop.
That's simply not how it works. Your yield return will cut short the loop and return the value you need, but both of these statements are completely independent and it is going to run:
for(int value = 0;; value++)
Both times.
Upvotes: 5