Reputation: 4882
Take this simple PHP code:
$xmlWriter = new XMLWriter();
$xmlWriter->openURI('php://output');
$xmlWriter->startDocument('1.0', 'utf-8');
$xmlWriter->writeElement('test', $data);
$xmlWriter->endDocument();
$xmlWriter->flush();
The XMLWriter class has a nice feature: it will convert any data you give to it to the output encoding. For example here it will convert $data
to UTF-8 because I passed 'utf-8'
in the startDocument
function.
The problem is that in my case the content of $data
comes from a database whose output format is UTF-8 and is therefore already in UTF-8. The XMLWriter probably thinks the data is in ISO-8859-1 and converts it again to UTF-8, and I get weird symbols where I should get accents.
Currently I'm using utf8_decode
around each string coming from the database, which means I'm converting from UTF-8 to ISO-8859-1, and then XMLWriter turns it back into UTF-8.
This works but is not clean:
$xmlWriter->writeElement('test', utf8_decode($data));
Is there a cleaner solution ?
EDIT: showing a full example
$xmlWriter = new XMLWriter();
$xmlWriter->openURI('php://output');
$xmlWriter->startDocument('1.0', 'utf-8');
$xmlWriter->startElement('usersList');
$database = new PDO('mysql:host=localhost;dbname=xxxxx', 'xxxxx', 'xxxxx');
$database->exec('SET CHARACTER SET UTF8');
$database->exec('SET NAMES UTF8');
foreach ($database->query('SELECT name FROM usersList') as $user)
$xmlWriter->writeElement('user', $user[0]); // if the user's name is 'hervé' in the database, it will print 'hervé' instead
$xmlWriter->endElement();
$xmlWriter->endDocument();
$xmlWriter->flush();
Upvotes: 2
Views: 4207
Reputation: 31621
I'm not sure where you got the idea that XMLWriter
converts encodings. It doesn't. You must supply it with utf-8. It can output different encodings, but input strings must be utf-8.
One of two things may be going on here:
header('Content-Type: application/xml; charset=UTF-8');
Upvotes: 7