Reputation: 2719
I have some XML
(XBRL
actually) documents containing some elements whose test
attribute contain an XPath expression:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<link:linkbase
xmlns:ea="http://xbrl.org/2008/assertion/existence"
xmlns:generic="http://xbrl.org/2008/generic"
xmlns:link="http://www.xbrl.org/2003/linkbase"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xff="http://www.xbrl.org/2010/function/formula">
<generic:link xlink:role="http://www.xbrl.org/2003/role/link" xlink:type="extended">
<!-- .... -->
<va:valueAssertion
... some attribs ...
test="if(xff:has-fallback-value(xs:QName('someQName'))) then false() else (count($someVariable) ge 1)"
/>
<!-- ... -->
</generic:link>
</link:linkbase>
The convention for processing the XPath expression is, that it uses the same namespaces for prefixes as declared in the XML document.
We also have some (custom) linting mechanism with a rule that checks if declared prefixes and their namespace are "used" within the document.
This means that in the xml example above, the xff and xs prefixes should be recognised as "used", since they are present in the XPath expression. Standard tooling (within Java) however, doesn't help us see that this is the case.
I could, for example, take al the prefixes in scope and check if I can find "prefix:" within the XPath string; but this seems like a very buggy solution and prone for both false positives and false negatives.
An other way would be to "just" evaluate the XPath expression using all the possible combination of namespace bindings and check what the minimal set of namespaces is. This won't over everything either, since the evaluation could skip an entire branch of code. (When an if statement is encountered, for example). Secondly, the explosion of possibilities increases quite quickly since we're talking about many (~100s) of documents containing multiple XPath expressions.
Does anyone know of a[n] (good) approach to tackle this issue? Currently, we're using Scala on the JVM to implement the checks. So either a native Java or Scala solution is preferred. Other JVM languages or depending on non-java tooling is acceptable if need-be.
Upvotes: 1
Views: 339
Reputation: 33000
Use javax.xml.xpath.XPath#compile(String)
to parse all XPath expressions in a document.
To know what namespaces prefixes are referenced in the expression prepare a NamespaceContext
implementation which records the prefixes of the requested namespace bindings and set it via XPath#setNamespaceContext(NamespaceContext)
before you call the compile method.
Based on that prefix list and given the namespace bindings on the attribute which holds the expression string you can then build a list of all used namespaces.
Upvotes: 1