Pithikos
Pithikos

Reputation: 20300

xpath is inspecting DOM or XML?

I am a bit confused about xpath, DOM and the actual XML.

From w3.org

XPath is a language for addressing parts of an XML document

From w3schools

XPath is used to navigate through elements and attributes in an XML document.

All this seems fine. Then however comes the fact that in xpath there is the text() node and text nodes are admittedly part of the DOM protocol. So is xpath actually inspecting the DOM?

Upvotes: 0

Views: 110

Answers (3)

Michael Kay
Michael Kay

Reputation: 163458

XPath defines a data model, which is a tree representation of XML, and the semantics of XPath expressions are defined in relation to this data model. In XPath 1.0 the model is part of the XPath spec; in 2.0 it's a separate specification called XDM. It's similar to the DOM but not quite the same; for example in DOM namespaces are accessible as attribute nodes but in XDM they are represented by namespace nodes. DOM allows you to represent things that don't correspond to any XML document (for example namespace prefixes in names that aren't bound to any namespace URI), but XDM is stricter.

Many XPath implementations work against tree models such as DOM, JDOM, or XOM which differ from XDM in minor details. Such an implementation in effect has to work out what to do when it encounters something that XDM doesn't allow: for example, what should happen when it encounters a DOM with adjacent or zero-length text nodes.

So you're right that XPath semantics are defined in relation to navigation of a tree, but that tree, while DOM-like, is not actually DOM.

Upvotes: 1

IMSoP
IMSoP

Reputation: 97898

XPath and DOM are both ways of working with the structure of an XML document. The W3C have formalised this structure under the name XML Infoset, representing the information contained by an XML document independent of how that document is currently represented.

XML, with all its < and > is the primary representation of that Infoset for transmission, although others are possible (e.g. Fast Infoset). But during processing of an XML document, you are not interested in how many times < appears, you're interested in the structure that markup represents.

Both XPath and the DOM contain their own model of an XML document which goes beyond the Infoset (in a carefully specified way) to provide higher levels of abstraction for traversing and manipulating a document. The similarity of the DOM's "Text Node" type, and XPath's text() node test is simply down to the fact that that is a useful abstraction to have when working with an XML document. The Infoset treats every character as a distinct "character information item", but pretty much every processor is going to want to assemble consecutive characters into a single string.

The DOM defines its model as a series of objects with strictly defined interfaces, for use in Object-Oriented Programming; it is actually somewhat independent of XML and the XML Infoset, having both its origins and current development focus as a model for interacting with web documents written in HTML. The model used by XPath has now been split into its own specification, the XQuery and XPath Data Model; it is explicitly built from an XML Infoset, in such a way as to allow structured queries.

Upvotes: 1

LarsH
LarsH

Reputation: 28004

When you say "the actual XML", do you mean the sequence of characters, as opposed to a tree structure in memory after the sequence of characters is parsed?

XPath operates on the tree structure of a parsed XML (or HTML) document. This is what the sentences you quoted are referring to when they say "an XML document."

DOM (Document Object Model) is one type of tree structure used to represent the structure of a parsed XML document in memory. So yes, XPath operates on the DOM. XPath does not operate on an unparsed sequence of characters.

The HTML tab in Firebug shows the DOM as a collapsible tree structure.

Upvotes: 0

Related Questions