Luis Ortiz
Luis Ortiz

Reputation: 3

XLST Remove Blank Space - XML CDATA

I want to extract the XML inside. I tried with this XSL, but i need to remove with space before

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns="http://sertex.com/Consult"
version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" />
 <xsl:template match="/">              
             <xsl:value-of select="//ns:Input/text()" disable-output-
    escaping="yes" />          
             </xsl:template>
     </xsl:stylesheet>

How can I remove the blank spaces , before XML Head. Because this is my result

Can you help me on that?

Upvotes: 0

Views: 1398

Answers (2)

John Bollinger
John Bollinger

Reputation: 180998

In the first place, it is worthwhile to understand the problem, which is that the content of the <ns0:Input> element your input XML contains whitespace before the CDATA section. XSLT automatically strips some whitespace, but that particular whitespace does not qualify, and cannot be made to qualify even by manipulating the XSLT whitespace stripping parameters, because whitespace stripping applies only to whitespace-only text nodes. Adjacent text nodes are merged before that analysis is performed, so even if you suppose that the CDATA section is initially parsed as a separate text node, the fact that the whitespace is outside the CDATA section does not change anything.

It is understandable that you want to omit any leading whitespace from the output, since none may precede the output XML declaration. Another answer offers normalize-space() as a way to do that, but it has broader effects than just on the leading whitespace. If you want to preserve all whitespace other than the leading whitespace, then you need to go to a bit more effort. For example:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ns="http://sertex.com/Consult"
    version="1.0">

  <xsl:output method="xml" omit-xml-declaration="yes" />

  <xsl:template match="/">              
    <xsl:variable name="embedded-text"
        select="//ns:Input/text()"/>
    <xsl:variable name="first-non-ws"
        select="substring(normalize-space($embedded-text), 1, 1)"/>
    <xsl:variable name="leading-ws-count"
        select="string-length(substring-before($embedded-text, $first-non-ws))"/>
    <xsl:value-of select="substring($embedded-text, $leading-ws-count + 1)"
        disable-output-escaping="yes" />          
  </xsl:template>

</xsl:stylesheet>

Having said that, I feel obliged to add that it is highly questionable that the encoding specified on the XML declaration in the resulting output differs from both UTF-8 and UTF-16, and is not specified in an xsl:output element as the encoding to use. This creates a guaranteed mismatch between declared and actual encoding of the output document. If the XSLT processor happens to use UTF-8 instead of UTF-16, that could be mitigated by the embedded XML using only characters among those encoded identically by UTF-8 and the encoding specified by the embedded XML declaration (ISO-8859-1). Note that the XSLT processor is also allowed to choose UTF-16, in which case you're toast.

Upvotes: 1

Eir&#237;kr &#218;tlendi
Eir&#237;kr &#218;tlendi

Reputation: 1180

The following is a fixed version of your XSL code.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ns0="http://sertex.com/Consult"
    version="1.0">
    <xsl:output method="xml" omit-xml-declaration="yes" />
    <xsl:template match="/">
        <xsl:value-of select="normalize-space(//ns0:Input/text())" disable-output-escaping="yes" />
    </xsl:template>
</xsl:stylesheet>

The key fixes:

  1. I changed the namespace to match the namespace of the input XML. This ensures that the templates in the XSL actually match the elements in the input XML.
  2. I added normalize-space() to trim whitespace in the CDATA text. This gets rid of the leading and trailing whitespace, producing output that parses as valid XML.
    Note: this also turns all newlines in the CDATA text into single whitespace characters. So if the newlines are important for you, this approach will not work.

Upvotes: 0

Related Questions