Nickon
Nickon

Reputation: 10156

Find first occurence of node without traversing all of them using XPaths and elementpath library

I use elementpath to handle some XPath queries. I have an XML with linear structure which contains a unique id attribute.

<items>
  <item id="1">...</item>
  <item id="2">...</item>
  <item id="3">...</item>
  ... 500k elements
  <item id="500003">...</item>
</items>

I want the parser to find the first occurence without traversing all the nodes. For example, I want to select //items/item[@id = '3'] and stop after iterating over 3 nodes only (not over 500k of nodes). It would be a nice optimization for many cases.

Upvotes: 1

Views: 33

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167716

An example using XSLT 3 streaming with a static parameter for the XPath, then using xsl:iterate with xsl:break to produce the "early exit" once the first item sought has been found would be

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all">
  
  <xsl:param name="path" static="yes" as="xs:string" select="'items/item[@id = ''3'']'"/>

  <xsl:output method="xml"/>

  <xsl:mode on-no-match="shallow-copy" streamable="yes"/>

  <xsl:template match="/" name="xsl:initial-template">
    <xsl:iterate _select="{$path}">
      <xsl:if test="position() = 1">
        <xsl:copy-of select="."/>
        <xsl:break/>
      </xsl:if>
    </xsl:iterate>
  </xsl:template>

</xsl:stylesheet>

You can run it with SaxonC EE (unfortunately streaming is only supported by EE) and Python with e.g.

import saxonc

with saxonc.PySaxonProcessor(license=True) as proc:
    print("Test SaxonC on Python")
    print(proc.version)
    
    xslt30proc = proc.new_xslt30_processor()

    xslt30proc.set_parameter('path', proc.make_string_value('/items/item[@id = "2"]'))

    transformer = xslt30proc.compile_stylesheet(stylesheet_file='iterate-items-early-exit1.xsl')
    
    xdm_result = transformer.apply_templates_returning_value(source_file='items-sample1.xml')

    if transformer.exception_occurred:
        print(transformer.error_message)

    print(xdm_result)

Upvotes: 1

Related Questions