Linq to Xml: Is XDocument a caching reader?

Question

I love the Linq to Xml API. Easiest one I've ever used.

I also remembered it's implemented atop XmlReader, which is a non-caching reader, meaning:

var rdr = XmlReader.Create("path/to/huge_3Gb.xml");

...would return immediately (having probably read, at most, the xml header).

The documentation for XDocument.Load() states that indeed, it is using XmlReader.Create().

I expected, like with everything Linq, that I'd get deferred execution behaviour with Linq2Xml.
But then I tried this, like I usually do for anything that touches files:

using(var xdoc = XDocument.Load("file")){ ... }

and surprise! It does not compile, 'cause XDocument does NOT implement IDisposable!

Hmm, that's peculiar! How will I ever release the file handle when I'm done using the XDocument?

And then it dawned on me: maybe XDocument.Load() eats up the whole Xml in memory, at once (and immediately closes the file handle)?

So I tried :

var xdoc = XDocument.Load("path/to/huge_3Gb.xml");

and waited, and waited, and then the process said:

Unhandled Exception: OutOfMemoryException.

So Linq to Xml is close to perfection (awesome API), but no cigar (when used on large Xmls).

My questions are:

Am I missing something and there IS a way to use Linq to Xml lazily?
If the answer to the previous question is 'No':

Is there an objective reason why the Linq to Xml API can't have deferred behaviour similar to, say, Linq to Objects? It looks to me that at least some operations (e.g. things that are possible with the forward-only XmlReader) could be implemented lazily.

...Or is it not implemented like this, quoting Eric Lippert,

" because no one ever designed, specified, implemented, tested, documented and shipped that feature" ?

Sergey Berezovskiy · Accepted Answer

Actually Linq to Xml uses deferred execution. But it queries in-memory data, not data from file. You can load data from file, from stream, from string, or build document manually - it does not matter how in-memory nodes graph will be constructed. Linq to xml is used to query in-memory representation of xml tree (i.e. objects graph).

Here is a sample which shows how deferred execution works with Linq to Xml. Consider you have XDocument which contains objects graph with following data:

It does not matter how you will create in-memory representation of this xml data. E.g.

 var xdoc = XDocument.Parse(xml_string);
 // or XDocument.Load(file_name);
 // or new XDocument(new XElement("movies"), ...)

Now define query:

var query = xdoc.Descendants("movie");

You can modify in-memory xml representation, which document contains:

xdoc.Root.Add(new XElement("movie"), new XAttribute("id", 3));

Now execute the query:

int moviesCount = query.Count(); // returns 3

As you can see, Linq to Xml uses deferred execution, but it works similar to Linq to Objects - in-memory data is queried here.

NOTE: XDocument does not implement IDisposable, because it does not holds any unmanaged resources after nodes graph has been constructed.

Linq to Xml: Is XDocument a caching reader?

Answers (1)

Related Questions