wolfmason
wolfmason

Reputation: 409

XPath in Schematron: How to determine if an xmlns attribute is present on a node

I'm trying to match any instance of a specific element that lacks an xmlns attribute, but I'm having trouble getting a match with the syntax. My xml is as shown:

<root>
<node xmlns:m="http://google.com"/>
<node style="block"/>
</root>

I want to return the first node, but not the second. If I were matching based on the style attribute shown on the second node, I could simply use not(@style) but this doesn't work for not(@xmlns:m). I've tried to circumvent this by searching for any attribute with a value that matches the URI, but again, this works for other attributes, but not xmlns:m. Is there some sort of limitation or syntax quirk that's required to match/parse xmlns attributes with XPath?

Upvotes: 3

Views: 963

Answers (2)

LarsH
LarsH

Reputation: 28004

As stated elsewhere, the question asks for something that XPath, and XML tools in general, are not designed to do: extract information about namespace declarations. XPath is designed to be able to reliably detect what namespace (as identified by its namespace URI, not its prefix) any element or attribute is in, and to select nodes based on their namespace. For that reason, any method to detect namespace declarations using standard XML tools is doomed to be unreliable.

Building on Mathias' answer, I would say to use this XPath test:

namespace::*[not(. = 'http://www.w3.org/XML/1998/namespace')
         and not(. = ../../namespace::*)]

(tested using http://www.qutoric.com/xslt/analyser/xpathtool.html). In a case like

<root>
  <node xmlns:m="http://google.com">
    <node style="block"/>
  </node>
</root>

the above XPath expression is truthy for only one node element, the outer one, thus satisfying the OP's question; whereas Mathias' expression would be truthy for both node elements.

It works by testing for namespace nodes (on the current element) whose namespace URIs are not shared by the parent element's namespace nodes.

However, this XPath expression will not always detect namespace declarations either. For example, in

<root>
  <node xmlns:m="http://google.com">
    <node xmlns:g="http://google.com" style="block"/>
  </node>
</root>  

the above XPath expression would be truthy only for the outer node, and would not detect the namespace declaration on the inner one. Again, this is because namespace declarations were intended only as a way to make it easier to specify what elements and attributes were in what namespaces, not as significant information carriers in themselves.

Granted, the above example seems unrealistic, because the inner namespace declaration is redundant. Nevertheless it is well-formed XML, and could easily be generated by well-behaved programs that produce the inner <node> without direct knowledge of the outer <node>'s namespace declarations.

Additional caveat: The namespace:: axis is deprecated in XPath 2.0 and later, so it may not be supported by whatever engine you use to run Schematron.

Upvotes: 2

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22647

Is there some sort of limitation or syntax quirk that's required to match/parse xmlns attributes with XPath?

Yes, kind of. The quirk is that things like

xmlns:m="..."

syntactically are attributes, but serve a more specific role than attributes. They are namespace declarations that bind prefixes to a namespace URI. The prefixes can then be used to qualify element and attribute names. There is also a default namespace that is not bound to a prefix.

It is impossible to detect namespace declarations because XPath (and XSLT, and Schematron) do not operate on actual XML documents, but on abstract representations of them. In this representation (a model), namespace declarations are absent, but there are namespace nodes which indirectly point to namespace declarations.

Once an XML parser has processed an XML document, namespaces and attributes are distinct types of nodes that you can access with XPath axes. I am not sure I understand why you would want to do that, but you can report namespace nodes using the namespace:: axis:

namespace::*[not(. = 'http://www.w3.org/XML/1998/namespace')]

You have to be careful and exclude the predefined namespace URI

http://www.w3.org/XML/1998/namespace

which is bound to the xml: prefix by default.

ISO Schematron

<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">

    <sch:pattern>
        <sch:rule context="node">
            <sch:report test="namespace::*[not(. = 'http://www.w3.org/XML/1998/namespace')]">Namespace node found!</sch:report>
        </sch:rule>
    </sch:pattern>

</sch:schema>

The document you show will not be valid against this SCH file and the Schematron validator will point to the node element with the namespace declaration:

<node xmlns:m="http://google.com"/>

as the source of the error.


Please Note

The namespace::* axis selects namespace nodes, not namespace declarations. Since namespaces are inherited by all elements that are in scope, it is not only the element where the namespace is declared that has a namespace node. All of its descendants will also have a namespace node:

<root>
  <node xmlns:m="http://google.com">
    <descendant_element_with_namespace_node/>
  </node>
  <node style="block"/>
</root>

See LarsH's answer for a more sophisticated XPath expression that accounts for this fact.

Upvotes: 6

Related Questions