nixau
nixau

Reputation: 1095

Streaming XPath evaluation

Are there any production-ready libraries for streaming XPath expressions evaluation against provided xml-document? My investigations show that most of existing solutions load entire DOM-tree into memory before evaluating xpath expression.

Upvotes: 16

Views: 7279

Answers (7)

Jarekczek
Jarekczek

Reputation: 7876

I think I'll go for custom code. .NET library gets us quite close to the target, if one just wants to read some paths of the xml document.

Since all the solutions I see so far respect only XPath subset, this is also this kind of solution. The subset is really small though. :)

This C# code reads xml file and counts nodes given an explicit path. You can also operate on attributes easily, using xr["attrName"] syntax.

  int c = 0;
  var r = new System.IO.StreamReader(asArgs[1]);
  var se = new System.Xml.XmlReaderSettings();
  var xr = System.Xml.XmlReader.Create(r, se);
  var lstPath = new System.Collections.Generic.List<String>();
  var sbPath = new System.Text.StringBuilder();
  while (xr.Read()) {
    //Console.WriteLine("type " + xr.NodeType);
    if (xr.NodeType == System.Xml.XmlNodeType.Element) {
      lstPath.Add(xr.Name);
    }

    // It takes some time. If 1 unit is time needed for parsing the file,
    // then this takes about 1.0.
    sbPath.Clear();
    foreach(object n in lstPath) {
      sbPath.Append('/');
      sbPath.Append(n);
    }
    // This takes about 0.6 time units.
    string sPath = sbPath.ToString();

    if (xr.NodeType == System.Xml.XmlNodeType.EndElement
        || xr.IsEmptyElement) {
      if (xr.Name == "someElement" && lstPath[0] == "main")
        c++;
      // And test simple XPath explicitly:
      // if (sPath == "/main/someElement")
    }

    if (xr.NodeType == System.Xml.XmlNodeType.EndElement
        || xr.IsEmptyElement) {
      lstPath.RemoveAt(lstPath.Count - 1);
    }
  }
  xr.Close();

Upvotes: 0

grtjn
grtjn

Reputation: 20414

Though I have no practical experience with it, I thought it is worth mentioning QuiXProc ( http://code.google.com/p/quixproc/ ). It is a streaming approach to XProc, and uses libraries that provide streaming support for XPath amongst others..

Upvotes: 1

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243489

XSLT 3.0 provides streaming mode of processing and this will become a standard with the XSLT 3.0 W3C specification becoming a W3C Recommendation.

At the time of writing this answer (May, 2011) Saxon provides some support for XSLT 3.0 streaming .

Upvotes: 4

David
David

Reputation: 311

FWIW, I've used Nux streaming filter xpath queries against very large (>3GB) files, and it's both worked flawlessly and used very little memory. My use case is been slightly different (not validation centric), but I'd highly encourage you to give it a shot with Nux.

Upvotes: 0

lavinio
lavinio

Reputation: 24309

There are several options:

  • DataDirect Technologies sells an XQuery implementation that employs projection and streaming, where possible. It can handle files into the multi-gigabyte range - e.g. larger than available memory. It's a thread-safe library, so it's easy to integrate. Java-only.

  • Saxon is an open-source version, with a modestly-priced more expensive cousin, which will do streaming in some contexts. Java, but with a .net port also.

  • MarkLogic and eXist are XML databases that, if your XML is loaded into them, will process XPaths in a fairly intelligent fashion.

Upvotes: 3

FoxyBOA
FoxyBOA

Reputation: 5846

Try Joost.

Upvotes: 1

Brian Agnew
Brian Agnew

Reputation: 272337

Would this be practical for a complete XPath implementation, given that XPath syntax allows for:

/AAA/XXX/following::*

and

/AAA/BBB/following-sibling::*

which implies look-ahead requirements ? i.e. from a particular node you're going to have to load the rest of the document anyway.

The doc for the Nux library (specifically StreamingPathFilter) makes this point, and references some implementations that rely on a subset of XPath. Nux claims to perform some streaming query capability, but given the above there will be some limitations in terms of XPath implementation.

Upvotes: 3

Related Questions