Stepan RYBAR
Stepan RYBAR

Reputation: 67

XSL-T: How to get the length of XML tree in characters other than "string-length(serialize(.))"?

Good afternoon!

The question: How to get the length of XML tree in characters using XSL-T or XPath in Saxon?

The goal: I would like to transform XML to "large" CSV and "small" CSV based on the size of the second-level elements (/root/secondLevelElement). The size is expressed in the number of characters. Additionally edited: The whole my effort is is about ETL (extract transform load) of XML to SQL database with huge continuous parallel load in the following way: Application Server -> extract to XML file -> transform from XML file to CSV file using XSL-T -> import into database. In one XML file will be from 20.000 to 50.000 secondLevelElements based on configuration of the script. Each of secondLevelElement could be from 5 to 15+ element level deep. The last column of the CSV will be the full secondLevelElement XML ready to imported as VARCHAR2(4.000) or CLOB, while previous columns will be some metadata extracted by XPath from secondLevelElement. Since the character length during import into database is crucial, that is why I need to know the EXACT length of the each full secondLevelElement XML.

The problem: I have found the following solution using XSL-T 3.0 functions "string-length(serialize(.))"

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  exclude-result-prefixes="xs" 
  version="3.0">
  <xsl:output 
    method="text"/>
  <xsl:template 
    match="/root/secondLevelElement">
    <xsl:value-of 
      select="string-length(serialize(.))"/>
  </xsl:template>
</xsl:stylesheet>

but it looks like quite slow for large XMLs. Is there any faster solution like some Saxon extension in Saxon PE or EE?

Thank You in advance for Your tips. Stepan

Upvotes: 0

Views: 1164

Answers (2)

Stepan RYBAR
Stepan RYBAR

Reputation: 67

Because Saxon HE 9.6.n.n for Java released at 2014-10-02 has support of XPath 3.0 and XPath 3.0 contains function serialize() and function string-length(), so the final string-length(serialize(myElement)) is my choice now.

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163458

If by "length of XML tree" (a strange concept: trees have height and breadth, but not length) you do actually mean the number of characters in the serialized output, then a pretty close approximation will be something like

sum(.//*/(string-length(name())*2 + 5))
+ sum(.//@*/(string-length(name()) + string-length(.) + 4))
+ sum(.//text()/string-length())

Computing that should be a fair bit faster than actually serializing.

It doesn't allow for empty element tags, namespace declarations, comments, or processing instructions, but it's not clear how accurate you need to be.

Upvotes: 1

Related Questions