Rasmond
Rasmond

Reputation: 512

Continue enumeration on IEnumerable

Using yield keyword we are able to only calculate the needed amount of items in the IEnumerable. I have set up a test, in which I was generating a number of items and then I wanted to generate one more. What I expected was the function to start where it ended, in this case on the 10000000 iteration and only iterate once more to 10000001. Please check out this piece of code:

public static void Main()
{
    var naturalNumbers = GetNaturalNumbers();

    var executionTime = GetExecutionTime(() => naturalNumbers.ElementAt(10000000));
    Console.WriteLine($"Time elapsed: {executionTime}");

    executionTime = GetExecutionTime(() => naturalNumbers.ElementAt(10000001));
    Console.WriteLine($"Time elapsed: {executionTime}");
}

public static IEnumerable<int> GetNaturalNumbers()
{
    Console.WriteLine("Running GetNaturalNumbers() from the beginning");
    for(int value = 0;; value++)
    {
        yield return value;
    }
}

public static System.TimeSpan GetExecutionTime(Action action)
{
    var stopwatch = Stopwatch.StartNew();

    action();

    stopwatch.Stop();
    return stopwatch.Elapsed;
}

Output:

Running GetNaturalNumbers() from the beginning
Time elapsed: 00:00:00.0618548
Running GetNaturalNumbers() from the beginning
Time elapsed: 00:00:00.0562454

The time elapsed in the second call is lower and made me guess that it's only because processor optimization. The function is called twice from the beginning.

What's the real reason behind this output? Is it possible to continue the iteration from the point at which we ended in the first call?

Edit:

Now I see that my example isn't good enough. Suppose we have a very complex and time consuming operation in the GetNaturalNumbers() function. I was expecting for the second call to start the enumeration from the 10000000'th element and not need to calculate all of it once again. I expected a significant performance improvement in the second call.


Another example:

Imagine a book with many pages, in which you don't want to get all of the pages at once, but only the amount that the reader needs. When the reader turns the page we would get it without having to calculate all the previous ones.

Upvotes: 3

Views: 927

Answers (4)

Mark Cilia Vincenti
Mark Cilia Vincenti

Reputation: 1614

You can create your own ElementAt extension method instead. It's an improvement over @DavidG's answer.

static class IEnumerableExtension
{
    public static T ElementAtRetainPos<T>(this IEnumerable<T> source, EnumeratorWithPos<T> enumeratorWithPos, int elementToFind)
    {
        if (elementToFind < enumeratorWithPos.EnumeratorPosition)
        {
            enumeratorWithPos.Enumerator = source.GetEnumerator();
        }
        for (var i = enumeratorWithPos.EnumeratorPosition; i < elementToFind; i++)
        {
            enumeratorWithPos.Enumerator.MoveNext();
        }
        enumeratorWithPos.EnumeratorPosition = elementToFind;
        return enumeratorWithPos.Enumerator.Current;
    }

    public static EnumeratorWithPos<T> GetEnumeratorWithPos<T>(this IEnumerable<T> source)
    {
        return new EnumeratorWithPos<T>()
        {
            Enumerator = source.GetEnumerator()
        };
    }
}

public class EnumeratorWithPos<T> : IDisposable
{
    private IEnumerator<T> enumerator;

    public IEnumerator<T> Enumerator
    {
        get
        {
            return enumerator;
        }
        set
        {
            enumerator = value;
            EnumeratorPosition = -1;
        }
    }
    public int EnumeratorPosition { get; set; }

    public void Dispose()
    {
        Enumerator.Dispose();
    }
}

How to use:

public static void Main()
{
    var naturalNumbers = GetNaturalNumbers();
    using (var enumeratorWithPos = naturalNumbers.GetEnumeratorWithPos())
    {
        var executionTime = GetExecutionTime(() => naturalNumbers.ElementAtRetainPos(enumeratorWithPos, 10000000));
        Console.WriteLine($"Time elapsed: {executionTime}");

        executionTime = GetExecutionTime(() => naturalNumbers.ElementAtRetainPos(enumeratorWithPos, 10000001));
        Console.WriteLine($"Time elapsed: {executionTime}");

        // force reset of enumerator by moving back to the previous
        executionTime = GetExecutionTime(() => naturalNumbers.ElementAtRetainPos(enumeratorWithPos, 10000001));
        Console.WriteLine($"Time elapsed: {executionTime}");
    }
}

public static IEnumerable<int> GetNaturalNumbers()
{
    Console.WriteLine("Running GetNaturalNumbers() from the beginning");
    for(int value = 0;; value++)
    {
        yield return value;
    }
}

public static System.TimeSpan GetExecutionTime(Action action)
{
    var stopwatch = Stopwatch.StartNew();

    action();

    stopwatch.Stop();
    return stopwatch.Elapsed;
}

First time it takes 00:00:00.1565686

Second request takes 00:00:00.0001373

Upvotes: 0

Gy&#246;rgy Kőszeg
Gy&#246;rgy Kőszeg

Reputation: 18013

Each ElementAt calls are independent ones. All IEnumerable extension methods are implemented so that they obtain a new enumerator.

If you use ReSharper you might also get a warning saying

Possible multiple enumerations of IEnumerable naturalNumbers

On the other hand, the ElementAt implementation really tries to do its best to retrieve the result quickly if possible. If the collection implements the IList<T> interface, then it will call its indexer in the first place. Otherwise, it uses the ultimate fallback mechanism by obtaining the enumerator.

What's the real reason behind this output?

Just ignore the first result, which also includes JIT time. To me the result seems to be ok.

Is it possible to continue the iteration from the point at which we ended in the first call?

Obtain the enumerator manually as per @DavidG's answer.

Upvotes: 1

DavidG
DavidG

Reputation: 118937

You are trying to not iterate over the same enumerable twice. To do this, you need to get a bit more manual with your handling of the values. Calling GetNaturalNumbers twice will give you two distinct enumerators, meaning your second operation just loops over all 10000001 items. You could do something like this which gets the IEnumerator<T> for the enumerable and manually moves over the incoming stream of numbers:

var enumerable = GetNaturalNumbers();
var enumerator = enumerable.GetEnumerator();

var fistElementToFind = 10000000;

for(var i = 0; i <= fistElementToFind; i++)
{
    enumerator.MoveNext();
}

var firstValue = enumerator.Current;
enumerator.MoveNext();
var nextValue = enumerator.Current;

Console.WriteLine($"First value: {firstValue}");
Console.WriteLine($"Next value: {nextValue}");

Upvotes: 2

Patrick
Patrick

Reputation: 5836

Everyone else seems to be talking about the speed, which doesn't seem to be your real question.

You are thinking that after you execute:

var executionTime = GetExecutionTime(() => naturalNumbers.ElementAt(10000000));

And then execute:

executionTime = GetExecutionTime(() => naturalNumbers.ElementAt(10000001));

That it is supposed to know you already did it for 10000000 so it should only do 1 iteration of the loop.

That's simply not how it works. Your yield return will cut short the loop and return the value you need, but both of these statements are completely independent and it is going to run:

for(int value = 0;; value++)

Both times.

Upvotes: 5

Related Questions