Reputation:
I have an XML document that is an ITF-16 LE Encoding. Because of that, It is not readable using wp all import.
When I look in the version section, I see this
<?xml version="1.0" encoding="Unicode" ?>
And in my visual studio code I at the bottom I see.
UTF-16 LE
I already changed using Visual studio, but since it going to be a new file every time (in the same format). It would be great if PHP could transform it into UTF-8
<?xml version="1.0" encoding="Unicode" ?>
<root>
<docs>
Is it possible to change the encoding of this file using PHP?
Upvotes: 0
Views: 843
Reputation: 19512
DOMDocument::loadXML()
reads the encoding attribute from the XML declaration. But Unicode
is not a valid encoding afaik - I would expect UTF-16LE
. The DOM API in PHP uses UTF-8. So it will decode anything to UTF-8 (depending on the defined encoding) and encode it depending on the encoding of the target document. You can just change it after loading.
Here is a demo:
$xml = <<<'XML'
<?xml version="1.0" encoding="utf-8"?>
<foo>ÄÖÜ</foo>
XML;
$document = new DOMDocument();
$document->loadXML($xml);
$encodings = ['ASCII', 'UTF-16', 'UTF-16LE', 'UTF-16BE'];
foreach ($encodings as $encoding) {
// set required encoding
$document->encoding = $encoding;
// save
echo $encoding."\n".$document->saveXML()."\n";
}
Output:
ASCII
<?xml version="1.0" encoding="ASCII"?>
<foo>ÄÖÜ</foo>
UTF-16
��<?xml version="1.0" encoding="UTF-16"?>
<foo>���</foo>
UTF-16LE
<?xml version="1.0" encoding="UTF-16LE"?>
<foo>���</foo>
UTF-16BE
<?xml version="1.0" encoding="UTF-16BE"?>
<foo>���</foo>
The generated string changes with the defined encoding.
I started with an UTF-8 document here - because SO is UTF-8 itself and you can see the non-ascii characters that way. ASCII
triggers the entity encoding for non-ascii characters. UTF-16
adds a BOM to provide the byte order. SO can not display the UTF-16 encoded chars - so you get the � symbol. UTF-16LE
and UTF-16BE
define the byte order in the encoding, no BOM is needed.
Of course it works the same the other way around.
Upvotes: 1
Reputation: 22301
Here is a generic XSLT that will copy your entire input XML as-is, but with the encoding specified in the xsl:output. What is left is just to run an XSLT transformation in PHP.
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="utf-8"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Upvotes: 1