Mikey S.
Mikey S.

Reputation: 3331

The interaction between yield and LINQ

I was reading a piece of code from the "XStreamingReader" library (which seems like a really cool solution for being able to execute LINQ queries over XML documents but without loading the actual document into the memory (like in an XDocument object) and was wondering about the following:

public IEnumerable<XElement> Elements()
{
    using (var reader = readerFactory())
    {
        reader.MoveToContent();
        MoveToNextElement(reader);
        while (!reader.EOF)
        {
            yield return XElement.Load(reader.ReadSubtree());
            MoveToNextFollowing(reader);
        }
    }
}

public IEnumerable<XElement> Elements(XName name)
{
    return Elements().Where(x => x.Name == name);
}

Regarding the 2nd method Elements(XName) - The method first calls Elements(), and then use Where() to filter it's results, but i'm kind of intrigued about the order of executions in here since Elements() contains a yield statement. From what I understand: - Executing Elements() returns an IEnumerable collection, this collection physically does not contain any items YET. - Where() is executed on that collection, behind the scene there's a loop which iterates through every item, new items are "Loaded" on the fly, since yield is being used. - All items which matched the Where statement are returned as an IEnumerable collection, and are PHYSICALLY IN that collection.

First, am I correct with the above assumption? Second, in case i'm right - what if I wanted to return a "yielded" collection rather than returning a collection which is filled up physically with all the filtered data? I'm asking this because it loses the entire purpose of NOT reading an entire "matching" block into the memory, but iterating one matching element at a time...

Upvotes: 1

Views: 1305

Answers (3)

Merlyn Morgan-Graham
Merlyn Morgan-Graham

Reputation: 59111

All items which matched the Where statement are returned as an IEnumerable collection, and are PHYSICALLY IN that collection. First, am I correct with the above assumption?

No. Where implements an additional enumerator internally, which does what you want it to do. If the IEnumerable is not enumerated, then the reader is never called, and the individual XElement instances never get created, and the filtering code is never run.

See Jon Skeet's article on re-implementing the behavior of the Where clause: http://msmvps.com/blogs/jon_skeet/archive/2010/09/03/reimplementing-linq-to-objects-part-2-quot-where-quot.aspx . He mimics the existing implementation (for explanitory purposes - no need to use his re-implementation in real code), and his code uses yield return.

Note that if you call ToList, though, then the entire enumeration will be evaluated and copied to a list, so be careful what you do with the IEnumerable that Where returns.

Also keep in mind that if the reader returned by readerFactory is reading from memory (e.g. StringReader), then the document will exist physically in memory - there just won't be any instance of DOM nodes until you enumerate them. And once you enumerate those elements, your document will exist twice in memory, one for the original document, one in DOM form. You may want to ensure that your streaming is done against a non-memory stream (e.g. directly from a file or network stream).

Upvotes: 0

svick
svick

Reputation: 244797

I assume when you say that items are physically in a collection, you mean that there is a structure in memory that contains all the items right now. With Where(), that's not the case, it uses yield too internally (or something that acts the same as yield).

When you try to fetch the first item, Where() iterates the source collection, until it finds the first item that matches. So, the elements are streamed both in Elements() and in Elements(XName) and the whole collection is never in memory, only piece by piece.

Upvotes: 2

Amy B
Amy B

Reputation: 110111

Where() is executed on that collection First, am I correct with the above assumption?

No. Where returns a lazy IEnumerable<XElement>. Later, when that IEnumerable<XElement> is enumerated, then the elements are yielded and filtered.

If the thing which enumerates that lazy IEnumerable happens to collect the elements (such as a call to ToList), then all the elements will be in memory at that point. If the thing which enumerates that lazy IEnumerable happens to process each item one at a time (such as a foreach loop, which does not retain a reference to the XElement), then only one item at a time will be in memory.

Upvotes: 1

Related Questions