Pierre François
Pierre François

Reputation: 6073

Why is xsltproc converting chars with accent in hexadecimal entities?

I have next HTML file, called input.html, from where I want to extract XML fragments:

<!DOCTYPE html>
<div>Text with ó</div>

I apply this XSL stylesheet, named stylesheet.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" indent="yes" />

  <xsl:template match="div">
    <tag attribute="{child::text()}"></tag>
  </xsl:template>

</xsl:stylesheet>

Executing xsltproc stylesheet.xsl input.html, I want to get next result:

<?xml version="1.0"?>
<tag attribute="Text with ó"/>

but instead, I get unwanted hexadecimal entities into the attribute:

<?xml version="1.0"?>
<tag attribute="Text with &#xF3;"/>

I wonder how I can avoid the introduction of these unwanted hexadecimal entities, without having to translate every possible entity back as explained at XSL: how do I keep xsltproc from tampering with an escaped HTML string in an attribute value?.

Upvotes: 0

Views: 38

Answers (1)

michael.hor257k
michael.hor257k

Reputation: 117102

Add an attribute of encoding="UTF-8" to your xsl:output instruction.

Upvotes: 1

Related Questions