Reputation: 29
I am trying to generate an XML file in a UTF-16 encoding with PHP but there is a problem when I open the generated file. I use DOMDocument to create the file. With a UTF-8 encoding, no problem. When opening the XML file with Notepad++, it looks like this :
<?xml version="1.0" encoding="UTF-16"?>㰀伀䈀㸀ഀ
<CLIENT> 㰀䈀伀䴀㸀ഀ
<BO> 㰀䄀搀洀䤀渀昀漀㸀ഀ
<Object>2</Object> 㰀嘀攀爀猀椀漀渀㸀㈀㰀⼀嘀攀爀猀椀漀渀㸀ഀ
</AdmInfo> 㰀䈀甀猀椀渀攀猀猀倀愀爀琀渀攀爀猀㸀ഀ
<row>
㰀䌀愀爀搀吀礀瀀攀㸀㠀㰀⼀䌀愀爀搀吀礀瀀攀㸀ഀ
... and so on !!! Can someone help me please ?
Using Notepad++, I set encoding to UTF-8 without BOM and the file looks like that :
<?xml version="1.0" encoding="UTF-16"?>㰀伀䈀㸀ഀ
<CLIENT> 㰀䈀伀䴀㸀ഀ
<BO> 㰀䄀搀洀䤀渀昀漀㸀ഀ
<Object>2</Object> 㰀嘀攀爀猀椀漀渀㸀㈀㰀⼀嘀攀爀猀椀漀渀㸀ഀ
</AdmInfo> 㰀䈀甀猀椀渀攀猀猀倀愀爀琀渀攀爀猀㸀ഀ
<row> 㰀䌀愀爀搀吀礀瀀攀㸀㠀㰀⼀䌀愀爀搀吀礀瀀攀㸀ഀ
<CardCode>01000001</CardCode> 㰀⼀爀漀眀㸀ഀ
</BusinessPartners> 㰀⼀䈀伀㸀ഀ
</BOM> 㰀⼀䌀䰀䤀䔀一吀㸀ഀ
A part of the PHP file as request :
header('Content-Type: text/xml');
//header('Content-Transfer-Encoding: binary');
$xml = new DOMDocument();
$xml->version='1.0';
$xml->encoding='UTF-16';
$ob_client = $xml->createElement('OB');
$client_element = $xml->createElement('CLIENT');
$client_bom_element = $xml->createElement('BOM');
$client_bo_element = $xml->createElement('BO');
$client_adminfo_element = $xml->createElement('AdmInfo');
$client_adminfo_object_element = $xml->createElement('Object', '2');
$client_adminfo_version_element = $xml->createElement('Version', '2');
$client_BusinessPartners_element = $xml->createElement('BusinessPartners');
$client_BusinessPartners_row_element = $xml->createElement('row');
$client_BusinessPartners_row_cardtype_element = $xml->createElement('CardType', $_XML_CardType);
$client_BusinessPartners_row_cardcode_element = $xml->createElement('CardCode', $_XML_CardCode);
...
$xml->formatOutput = true;
echo $xml->saveXML();
$xml->save('rudy-xml-particulier'.$commandeId.'.xml');
Thanks a lot.
Upvotes: 1
Views: 4057
Reputation: 198237
You already generate an XML file with UTF-16. All you need to do is to specify the encoding upfront which you do:
$doc = new DOMDocument();
$doc->encoding='UTF-16';
So the problem is more likely when you add data, especially element values. PHP won't give any warning nor prevent you from adding non UTF-8 byte-sequences. Here is an example that provokes that even:
$_XML_CardType = "\xA9"; # non utf-8 byte-sequence (latin-1 copyright symbol)
$xml->createElement('CardType', $_XML_CardType); # returns DOMElement
Then when you use
echo $xml->saveXML();
PHP might tell you about the problem (depending on the PHP version, error reporting settings and underlying libraries) and (for the newer PHP versions) cut off the string at the place where the error occurs. An exemplary error message is:
Warning: DOMDocument::saveXML(): output conversion failed due to conv error, bytes 0xA9 0x3C 0x2F 0x69
Therefore all you need to do is to ensure that the string data you use with createElement
for the value is UTF-8 encoded. And that is already all you need to do.
As you say you fetch the data from a database, please consult the documentation of your PHP database client library how to make it returning strings in UTF-8 encoding. That should immediately solve your issue.
To ensure that you then get a string in UTF-8 encoding test it before you insert it, for example with a Regex to detect Invalid UTF-8 String:
if (!preg_match('//u', $_XML_CardType) {
throw new Exception("Non utf-8 string deteced.");
}
$xml->createElement('CardType', $_XML_CardType);
This will throw an exception instead of inserting then. Also log/display errors and follow the error stream to spot additional problems.
Upvotes: 2