imran
imran

Reputation: 461

entity within processing instruction getting change

I have an xml file.

  1. I am transforming processing instruction as element.
  2. Tanking value of processing instruction in attribute.
  3. Problem is there a entity within processing instruction that is getting change   to  .
  4. I want entity remain as it is.

             <element>
             <?comment adtxt="hello &#160; Guys"?>
            </element>
    

My xslt code:

        <xsl:template match="element">
        <xsl:copy>
        <xsl:apply-templates/>
        </xsl:copy>
        </xsl:template>
        <xsl:template match="processing-instruction(comment)">
        <inddq>
        <xsl:attribute name="adtxt">
        <xsl:value-of select="."/>
        </xsl:attribute>
        <xsl:processing-instruction name="comment">
        <xsl:value-of select="."/>
        </xsl:processing-instruction>
        </inddq>
        </xsl:template>

and output i am getting

    <element>
    <inddq adtxt="adtxt=&#34;hello &amp;#160; Guys&#34;">
    <?comment adtxt="hello &#160; Guys"?>
    </inddq>
    </element>

Already Thanks,

Upvotes: 0

Views: 164

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167571

This is a tricky issue, the contents of a processing instruction is not parsed as XML, see https://www.w3.org/TR/REC-xml/#sec-pi saying

PIs are not part of the document's character data

so if you want to parse that contents as XML as you seem to want to have the XML character reference interpreted by an XML parser and later being output as &#160; then a clean solution would need XSLT 3 with

  1. parse-xml-fragment
  2. use of a character map

So

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:output use-character-maps="m1"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:character-map name="m1">
      <xsl:output-character character="&#160;" string="&amp;#160;"/>
  </xsl:character-map>

    <xsl:template match="processing-instruction(comment)">
            <inddq>
            <xsl:attribute name="adtxt">
            <xsl:value-of select="parse-xml-fragment(.)"/>
            </xsl:attribute>
            <xsl:processing-instruction name="comment">
            <xsl:value-of select="."/>
            </xsl:processing-instruction>
            </inddq>
    </xsl:template> 

</xsl:stylesheet>

would transform

<element>
         <?comment adtxt="hello &#160; Guys"?>
        </element>

with an XSLT 3 processor like Saxon 9.8 (https://xsltfiddle.liberty-development.net/eiZQaG3) or 9.9 or Altova 2017 or 2018 to

<element>
         <inddq adtxt='adtxt=&#34;hello &#160; Guys&#34;'><?comment adtxt="hello &#160; Guys"?></inddq>
        </element>

On the other hand, that is not a preservation of any character reference inside the processing instruction's data, it is just a way to parse that as XML and then, for the output, to replace any Unicode non breaking space character through the character map with the sequence &#160; representing a numeric character reference of that character.

Of course the approach can be extended to other character references but in any case the character map will be applied to any output character, it is not possible to restrict it just to the adtxt attribute value.

As an alternative to the use of the XSLT/XPath 3 function parse-xml-fragment you could use replace, as done in https://xsltfiddle.liberty-development.net/eiZQaG3/1, but that still needs the use of the character map: https://xsltfiddle.liberty-development.net/eiZQaG3/1

Upvotes: 4

Related Questions