Noman_1
Noman_1

Reputation: 530

If IEnumerable data source is changed, it changes the results

Given the following code:

using System.Linq;              
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        //Init data
        char[] chars = new char[10];
        FillData(chars);

        // Write the initial data
        PrintContents("Initial data:", chars);
        //Take some data:
        IEnumerable<char> acc = chars.Take(3);
        //View data
        PrintContents("Enum:", acc);

        //Edit data
        chars[0] = 'z';
        chars[1] = 'z';
        chars[2] = 'z';

        //View data again
        PrintContents("Enum after modifing source:", acc);

        //Restart data
        chars = new char[5];
        FillData(chars);

        //View data when source is replaced
        PrintContents("Enum after new source:", acc);
    }

    //Gets a ref
    private static void FillData(char[] data)
    {
        for(int i = 0; i < data.Length; i++)
        {
            data[i] = (char)('a' + i);
        }
    }

    private static void PrintContents(string what, IEnumerable<char> src)
    {
        System.Console.WriteLine(what);
        string s = "";
        foreach(char ch in src)
        {
            s += ch;
        }
        if(s.Length > 0)
        {
            System.Console.WriteLine(s);
        }
    }
}

I get this output:

Initial data:
abcdefghij
Enum:
abc
Enum after modifing source:
zzz
Enum after new source:
zzz

I know about the deferred execution, but is that the expected behaivour? This means I should ever reuse an IEnumerable or any data used on an IEnumerable without creating a new collection as I may change the results on a program.

This means that the IEnumerable will hold a reference to the data sources as well even if they are not used by the visible code as well and will not be Garbage Collected until the IEnumerable itself is to be collected.

I have been using IEnumerable a lot on a recent project and the more I see them the less I like them. Don't take me wrong, Linq does a great job but I would prefer it to sometimes return the same type of the source.

Upvotes: 2

Views: 837

Answers (1)

Alexei Levenkov
Alexei Levenkov

Reputation: 100545

Yes, this is expected behavior.

You should look at results of LINQ methods as "compute result when I enumerate" and not as "collection of items". To me it makes easier to understand that when I enumerate it the second time it will again compute the results as I walk through items.

It matters a lot in cases when source data may change (like the sample in the question) or when obtaining result is costly (querying DB is very common case of hidden cost). Unfortunately there is no common way to clarify whether enumerable is costly (i.e. DB) or essentially free (i.e. list) and both cases - repeated querying for live data or repeated enumeration of cached result - are commonly used. IQueryable is somewhat an indication of costly, lazily evaluated enumerable but just having IEnumerable does not tell anything about how costly/up-to-date results would be.

On your concern that queries keep data sources alive for possible longer than you expect - yes, it is a concern. You should understand what is expected usage of the result and consider if returning non-lazy result is better (i.e. with .ToList()). Be careful when getting data from disposable sources (DB, files, and non-seekable source like network streams) - it is often easier to force evaluation of the query and return List (or any other non-lazy) collection to have control over how and when data source is disposed.

For example you should strongly consider passing non-lazy enumerables to ASP.Net MVC views - data may be easily iterated multiple times to render (even .Count() is an iteration) so lazily computed enumerable on DB can easily double or triple cost of rendering the page.

Upvotes: 3

Related Questions